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Abstract 

The  paper  revisits  Wald's  (1947)  sequential  experimentation  paradigm,  now 
assuming  that  an  impatient  decision  maker  can  run  variable-size  experiments 
each  period  at  some  increasing  and  strictly  convex  cost  before  finally  choosing 
an  irreversible  action.  We  translate  this  natural  discrete  time  experimentation 
story  into  a  tractable  control  of  variance  for  a  continuous  time  diffusion.  Here 
we  robustly  characterize  the  optimal  experimentation  level:  It  is  rising  in  the 
confidence  about  the  project  outcome,  and  for  not  very  convex  cost  functions, 
the  random  process  of  experimentation  levels  has  a  positive  drift  over  time.  We 
also  explore  several  parametric  shifts  unique  to  our  framework.  Among  them, 
we  discover  what  is  arguably  an  'anti-folk'  result:  Where  the  experimentation 
level  is  positive,  it  is  often  higher  for  a  more  impatient  decision  maker. 

This  paper  more  generally  suggests  that  a  long-sought  economic  paradigm 
that  delivers  a  sensible  law  of  demand  for  information  is  our  dynamic  one  — 
namely,  allowing  the  decision  maker  an  eternal  repurchase  (resample)  option. 
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1.   INTRODUCTION 

This  paper  revisits  a  classic  decision  theory  contribution  around  its  semicentenial  — 
Wald's  (1947a)  sequential  probability  ratio  test}  More  than  any  other  work,  this  has 
defined  the  paradigm  of  optimal  sequential  experimentation,  either  statistical  or  Bayesian. 
For  it  tackled  afresh  the  simplest  of  decision  problems  —  for  instance,  choosing  among  two 
actions  (perhaps  accepting  hypotheses),  each  optimal  in  one  of  two  states  of  the  world. 

In  his  Bayesian  formulation,  Wald  posits  that  the  decision  maker  {VA4),  ever  uncertain 
of  the  state,  can  buy  multiple  i.i.d.  informative  signals  at  constant  marginal  cost.  Wald 
shows  that  the  VJ^  should  purchase  sequentially,  and  act  when  sufficiently  convinced  of 
one  state.  Yet  we  venture  that  the  experimental  schedule  should  accelerate  when  homo 
economicus  runs  the  laboratory  in  real  time.  We  offer  here  a  compelling,  economically- 
motivated  twist  along  these  lines  that  also  creates  a  pure  theory  of  dynamic  R&D.  In  our 
spin  on  Wald's  tale,  the  VM.  is  assumed  impatient,  but  may  elect  a  variable-size  experiment 
each  period  only  at  an  increasing,  strictly  convex  cost  of  information.  We  argue  that  this 
richer  experimentation  story  provides  some  intriguing  new  testable  implications,  and  that 
it  also  points  economists  to  a  long-sought  well-behaved  theory  of  information  demand. 

This  paper  has  three  goals,  attacked  in  sequence  after  an  essential  literature  review. 
First,  we  motivate  our  impatient  DM.  and  convex  cost  spin  on  Wald's  discrete  time  tale, 
and  then  introduce  and  solve  a  tractable  continuous  time  equivalent  model  of  experimen- 
tation —  which  we  argue  is  a  control  of  variance  for  a  diffusion  with  uncertain  mean. 

Second,  we  investigate  the  robust  character  of  the  experimentation  level.  One  might 
suppose  that  it  peaks  when  the  VM.  has  middling,  elastic  beliefs  over  the  state.  In  fact, 
we  find  the  opposite:  Given  a  convex  cost  function,  experimentation  grows  in  the  expected 
payoff,  which  is  greatest  at  extreme  beliefs.  For  example,  R&D  levels  are  least  when  one  is 
most  discouraged,  peaking  just  prior  to  project  approval.  We  argue  that  the  VM  acts  like 
a  neoclassical  competitive  firm  producing  information  at  an  increasing  marginal  cost,  with 
an  increasing  "producer  surplus."  Optimal  stopping  demands  that  this  surplus  equal  the 
cost  of  delaying  the  final  decision,  i.e.  the  portion  of  current  value  killed  by  discounting. 
Ipso  facto,  the  research  level  that  generates  this  surplus  rises  in  the  value.  We  also  show 
that  the  level  is  convex  in  beliefs  for  not  too  convex  costs,  and  so  drifts  up  rise  over  time. 

Finally,  we  explore  how  experimentation  level  responds  to  some  natural  parametric 
shifts.  Among  our  findings,  we  discover  that  contrary  to  established  folk  wisdom,  a  more 
impatient  VM.  often  experiments  at  a  higher  level  provided  that  he  still  experiments. 


^The  original  source  is  a  1943  war-classified  mimeo  by  Wald.  Wald  (1945a)  and  (1945b)  were  the  first 
published  versions  of  the  program.  In  his  (1947)  preamble,  he  credits  Milton  Friedman  and  Allen  Wallis 
for  proposing  sequential  analysis.  Wallis  details  this  history  in  his  (1980)  retrospective  paper. 


2.   THREE  RELATED  LITERATURE  THREADS 

2.1  Statistical  Testing 

The  (1933)  Neyman-Pearson  theorem  says  that  the  best  critical  test  of  two  competing 
hypotheses  given  a  fixed  sample  has  the  form:  accept  Ho  (vs.  Hi)  iff  the  sample  likelihood 
ratio  exceeds  a  threshold.  Wald  and  Wolfowitz  (1948)  proved  that  the  sequential  probabil- 
ity ratio  test  (SPRT)  is  optimal  —  minimizing  the  expected  number  of  observations  for  a 
given  power,  even  given  either  hypothesis.  Arrow,  Blackwell,  and  Girshick  (1949)  further 
developed  this  idea,  and  was  among  the  earliest  of  contributions  to  dynamic  programming. 

The  SPRT  has  been  explored  and  generalized  by  several  authors  (see  Chernoff  (1972)). 
But  in  a  half  century,  surprisingly  no  one  has  characterized  the  optimal  sample  size  or 
its  dynamics  with  discounting,  where  a  unit  sample  size  is  no  longer  optimal  —  our  main 
contribution.  Cressie  and  Morgan  (1993),  a  recent  sampling  from  the  frontier,  prove  that 
a  variable  size  probability  ratio  test  is  unconditionally  optimal  among  sequential  design 
procedures,  for  instance,  while  a  SPRT  is  best  with  superadditive  costs  and  no  discounting. 

2.2  Optimal  Experimentation 

Optimal  experimentation  was  born  out  of  sequential  analysis.  Among  the  canonical 
Bayesian  learning  models,  those  with  a  discrete  action  space  are  essentially  stopping  rule 
problems,  or  more  generally,  bandit  models;  continuous  action  models  are  richer,  and  not 
so  easily  pigeon-holed.  Our  model  is  rare  as  it  grafts  together  these  two  contexts:  variable 
experimentation  with  an  eventual  binary  stopping  decision.  Crucially,  because  our  VM 
makes  a  pure  information  purchase  until  stopping,  it  differs  from  all  other  experimentation 
models  where  myopic  expected  stage  payoffs  are  themselves  informative  random  signals; 
here,  payoffs  are  known  and  negative  (the  information  cost),  until  the  final  decision  stage. 

This  paper  investigates  the  level  and  drift  of  experimentation,  as  opposed  to.  the  long- 
run  question,  'Is  learning  complete?'^  The  literature  has  also  studied  three  main  short-run 
questions.  For  finite  action  models  like  bandits,  there  has  been  work  on  the  sequential 
ordering  of  experiments,  or  stochastic  scheduling.  Second,  the  direction  of  experimentation 
has  been  explored  in  continuous  action  models  like  monopoly  pricing.^  Finally,  the  secular 
diminishing  trend  in  experimentation  has  been  remarked  in  papers  on  search  with  learning 
(eg.  rising  reservation  prices).  This  comes  closest  to  our  thrust,  and  yet  runs  exactly 
counter  to  the  rising  experimentation  levels  that  we  find.  For  with  price  search,  the 
information  purchase  is  blended  with,  and  thus  colored  by,  immediate  payoff  concerns. 


^See,  for  instance,  Easley  and  Kiefer  (1988)  or  Kihlstrom,  Mirman,  and  Postlewaite  (1984). 
^McLennan  (1984)  and  Trefler  (1993)  are  good  examples.  Keller  and  Rady  (1997),  who  posit  a  randomly 
shifting  demand  curve,  assume  continuous  time  learning;  it  is  therefore  also  a  technically  relevant  paper.  . 


2.3  Research  and  Development 

We  interpret  a  special  case  of  our  analysis  as  a  pure  theory  of  dynamic  R&D.  While  this 
is  obviously  a  well-studied  subject,  other  work  differs  in  a  key  dimension.  During  1971- 
82,  Kamien  and  Schwartz  modelled  R&D  as  (very  roughly)  covering  a  possibly  uncertain 
distance  in  'progress  space',  given  cost  and  effort  functions.  Dutta  (1997)  has  recently 
assumed  a  related  budget  constrained  model.  It  may  help  to  imagine  extant  work  as 
capturing  'D',  and  ours  'R'.  For  in  our  setting,  the  actual  state  of  the  world  ("Is  this 
feasible?" )  always  hangs  in  the  balance,  while  these  papers  posit  a  goal  that  is  eventually 
attainable,  if  desired.  We  cast  research  as  optimal  learning  rather  than  resource  allocation; 
this  is  a  theory  of  science  (the  search  for  truth)  and  not  engineering  (its  implementation). 
An  eventual  comprehensive  theory  of  dynamic  R&D  will  no  doubt  embed  both  phases. 

3.  MOTIVATING  THE  CONTINUOUS  TIME  MODEL 

3.1   Bayesian  Sequential  Analysis 

A.  The  Final  Static  Decision  Problem.  In  a  standard  problem,  the  VJvi  eventually 
must  choose  between  two  actions:  a  =  A,B  pays  7rf  in  the  state  of  the  world  6  =  L,H. 
Action  A  is  better  in  state  L,  and  B  in  state  H,  and  the  VM.  does  not  know  the  state. 

Assume  that  the  VM.  is  Bayesian  and  risk  neutral  (simply  treat  utilities  as  payoffs). 
If  the  VAd's  prior  belief  on  H  is  p,  then  his  expected  payoff  for  action  a  is  afiine  in  p, 
say  TTa{p)  =  pn^  +  (1  ~  p)'^a-  The  VM  is  indifferent  between  A  and  B  at  some  belief  p. 
We  also  admit  the  costless  option  of  never  deciding,  and  thus  need  a  default  null  action 
yielding  constant  zero  payoff.  His  static  value  is  then  7r(p)  =  max(7r^(p),7rB(p),0).  Given 
the  null  action,  we  can  assume  WLOG  (without  loss  of  generality)  that  action  B  optimally 
pays  a  positive  payoff  tt^  >  0.  If  state  L  is  dominated,  then  ■k{p)  is  always  increasing  in  p. 

The  special  case  of  one  risky  and  one  safe  action  conveniently  captures  a  stylized  R&D 
problem:  action  B  means  'building'  a  costly  new  prototype,  and  action  A  'abandoning'  it. 
If  the  VM  builds,  it  might  or  might  not  work:  payoffs  are  /i  =  7rB(l)>0or£  =  7rB(0)  <0 
in  states  6  =  H  and  6  =  L.  The  VM  earns  zero  regardless  if  he  abandons:  7r^(p)  =  0.  So 
tt{p)  =  max(0,  hp  +  i{l  —  p)),  and  the  VM  invests  iff  p  >  p  =  —i/{h  —  I). 

B.  Information  Acquisition.  Before  choosing  an  action,  let  the  VM  initially  acquire 
informative  signals  of  ^  at  a  fixed  unit  cost.  Assume  he  maximizes  the  expected  final  reward 
less  costs  incurred.  As  Wald  and  Wolfowitz  proved  that  sequential  purchases  are  optimal, 
this  is  a  pure  optimal  stopping  exercise:  The  VM  quits  at  the  stopping  time  T  and  chooses 
action  A  (or  B,  or  the  null  action,  i.e.  quits)  with  posteriory  <  p  (orp  >  p,  or  p  G  [Pq,Po]},- 


3.2  Discrete  Time  Experimentation  with  Impatience  and  Convex  Costs 

Our  goal  is  to  extend  Wald's  setting  along  two  dimensions:  payoff  discounting  and  cost 
convexity  of  experimentation.  Tiiese  assumptions  are  of  particular  interest  for  economists, 
less  so  for  statisticians.  This  may  explain  why  this  stone  has  been  left  unturned. 

First  assume  an  impatient  VM ,  who  maximizes  the  expected  present  discounted  value 
of  wealth.  Faced  with  the  time  cost  of  delay,  the  VM.  does  not  necessarily  wish  to  proceed 
purely  sequentially,  but  may  opt  to  stack  his  information  purchases.  After  seeing  his  signal 
outcomes  in  any  period,  he  either  purchases  more,  or  stops  and  chooses  an  action.  In  period 
k,  the  VM.  may  buy  A^^  i.i.d.  signals  Xi, . . .  ,X[^^  at  cost  C{Nk),  with  C(0)  >  0.  One 
might  venture  that  running  a  lab  incurs  a  daily  rent,  independent  of  the  experimentation 
level.  In  this  case,  there  is  a  positive  fixed  flow  experimentation  cost  C{0)  >  0.  But,  the 
analogy  with  Wald's  setting  is  obviously  closest  with  no  fixed  costs  C(0)  =  0. 

Next  assume  strictly  convex  information  costs  within  a  period;  this  fosters  more  equal 
purchases  across  periods,  reinforcing  Wald's  sequential  conclusion,  absent  discounting. 
Such  an  assumption  makes  economic  sense  on  two  grounds.  First,  plausibly  not  all  re- 
searchers are  equally  talented  in  producing  information.  More  intensive  information  search 
then  must  draw  on  the  efforts  of  less  capable  researchers.  Second,  as  with  non-Bayesian 
inventive  activity,  since  contemporaneously-produced  knowledge  is  based  on  the  same  cur- 
rent stock,  identical  or  similar  discoveries  are  not  rare:"*  Even  if  research  laboratories  are 
created  at  constant  cost,  different  labs  may  well  expend  resources  duplicating  results.  Like- 
wise concurrent  Bayesian  information  tends  to  be  correlated,  as  it  grows  increasingly  hard 
to  produce  i.i.d.  signals.  Then  note  that  a  constant  marginal  cost  for  correlated  information 
intuitively  corresponds  to  an  increasing  marginal  cost  of  independent  information. 

3.3  Developing  the  Continuous  Time  Experimentation  Model 

Although  variable  intensity  experimentation  is  easily  understood  and  formulated  in 
discrete  time  using  dynamic  programming,  the  solution  is  intractable  even  for  the  very 
simplest  signal  structures.  Perhaps  this  explains  why  no  one  has  seriously  pursued  it.  We 
now  describe  a  tractable  continuous  time  learning  paradigm  that  captures  the  quintessence 
of  the  discrete  time  story.  Below  we  sketch  and  motivate  it  and  focus  on  its  economic 
substance.  Its  recursive  solution  is  found  in  section  4.2,  and  a  formal  justification  in 
Appendix  B.  In  a  work  in  progress,  we  argue  that  the  model  and  its  solution  is  the  limit 
of  a  rich  class  of  discrete  time  models  with  a  vanishing  time  interval  between  experiments. 


^A  recurring  theme  in  science  is  that  great  minds  simultaneously  think  alike  (eg.  Newton  and  Leibniz' 
codevelopment  of  calculus).  Indeed,  those  seeing  further  often  stand  on  the  shoulders  of  the  same  giants. 


A.  The  Signal  Process.  In  discrete  time,  choosing  the  number  of  i.i.d.  signals  to 
buy,  each  with  distinct  state-dependent  means,  is  an  apt  description  of  variable  intensity 
experimentation.  For  instance,  iiX  has  mean  ±/i  in  in  states  H,  L,  then  experiments  offer  a 
noisy  but  informative  glimpse  of  the  signal  mean.  In  fact,  the  very  goal  of  experimentation 
is  to  infer  this  mean,  for  in  so  doing,  the  DM.  learns  the  state.  Since  the  average  signal 
X  =  J2  Xi/N  is  sufficient  for  the  unobserved  mean,  greater  experimentation  only  serves 
to  decrease  the  variance  of  X:  Doubling  the  sample  size  precisely  halves  the  variance. 

Motivated  by  this  general  observation,  we  model  continuous  time  experimentation  as 
the  control  of  variance  of  a  diffusion  observation  process  (xt).  For  a  fixed  control,  (xt) 
is  a  Brownian  motion  with  constant  uncertain  drift  and  known  variance.  Nature  chooses 
its  drift,  jjf  in  state  9,  where  fx^  =  —fj,^  =  /i  >  0,  while  the  VA4  controls  its  flow  variance 
a'^/ut,  with  the  intensity  n^.  The  observation  diffusion  process  thus  solves  the  stochastic 
differential  equation  (SDE) 


a 


dxl  =  ^fdt  +  -—dWt  (1) 


in  state  6.  Here,  we  think  of  Ut  as  the  flow  of  information  purchases,  and  call  it  the 
experimentation  level  or  intensity.  Doubling  rit  halves  the  "variance"  of  dx^,  yielding  a 
doubly  informative  time-t  experiment.  As  usual,  dWt  ~  ^^^(0,  dt)  is  the  Wiener  increment, 
while  the  control  Uf  depends  on  the  observation  and  intensity  history  {xs,0  <  s  <  t)  U 
{ns,0  <  s  <  t).  It  is  a  feedback  and  not  open  loop  control,  not  decided  at  time-0.^ 

See  §17.6  in  Liptser  and  Shiryayev  (1978),  §17.5  in  Chernoff  (1972),  or  §4.2  in  Shiryayev 
(1978)  for  the  pure  problem  of  estimating  the  bivariate  drift  of  a  Brownian  motion.  Their 
motivation  is  its  link  to  the  heat  equation.  Neither  source  discusses  control  of  variance.  To 
be  sure,  without  time  preference,  there  is  no  pressing  reason  to  consider  such  an  exercise. 

Remark.  Our  choice  of  process  (xf)  yields  a  motivational  pure  control  of  variance: 
But  a  realization  xt  =  Jq  dxg  is  an  unweighted  running  integral  of  sample  means,  and  so  is 
not  a  sufficient  statistic  for  {{xs,ns),0  <  s  <  t)  (w.r.t.  the  drift  /z^).  Consider  instead  the 
running  sample  totals  observation  process  (xf),  obeying  dx^  =  Utfi^dt-V-  {o^/nDdWt-  In 
that  case,  J^  Ugds  can  be  thought  of  as  the  running  sample  size,  so  that  {xt,  /q  Ugds)  G  M^ 
is  a  simple  sufficient  statistic  for  the  mean.  This  process  also  yields  a  clearly  well-defined 
level-0  experimentation,  as  will  our  belief  filter  (2)  in  §4.2.  We  see  also  that  a  higher 
intensity  level  rit  essentially  accelerates  time  —  advancing  the  schedule  that  the  VM. 
observes  future  samples.  The  two  processes  (xf)  and  (x^)  are  mere  conceptual  devices, 
and  a  choice  between  them  is  not  critical:  Both  are  sufficient  for  the  drift  i/  (and  hence 
the  state  6),  and  crucially  yield  the  same  belief  filter  (2),  which  is  what  we  work  with. 


^This  is  why  the  signal  must  be  defined  recursively  via  a  SDE  rather  than  as  an  exogenous  Ito  process. 


B.  The  Cost  of  Flow  Experimentation.        Intensity  level  n  incurs  a  flow  cost  c{n). 

('A'):  The  cost  Junction  c{n)  is  finite,  increasing,  strictly  convex  (and  thus  continuous)  on 
[0,00),  differentiable  on  (0,oo),^  with  nonnegative  fixed  costs  c(0)  >  0.  Marginal  costs  are 
'Lipschitz-down'  on  {0,00):  c'{n2)—c'{ni)  >  X{n2  —  ni),  for  some  X>0,  and  any  7i2>ni>0. 

Strict  cost  convexity  means  0(7/21  +  (1  —  7)712)  >  7c(ni)  +  (1  —  7)0(7x2)  for  ni  7^  722  and 
0<7<1.  It  precludes  undesirable  bang-bang  control  solutions.  Both  the  Lipschitz-down 
property  —  which  is  true  if  c"(>  0)  exists  —  and  differentiability  play  purely  technical 
roles,  ensuring  that  our  observation  process  (1),  and  solution  is  well-defined. 

Remark.  Weak  (though  not  strict)  cost  convexity  remarkably  obtains  WLOG  in 
continuous  time.^  For  the  VM.  can  achieve  any  cost  function  arbitrarily  close  to  the  lower 
convex  hull  vex(c)  =  sup{co  <  c|co  is  convex}.  Indeed,  for  any  e  >  0,  any  average  cost  at 
least  7c(7ii)  +  (1  —  7)0(712)  —  e  is  achieved  in  any  time  interval  [fo,  ^1]  by  sufficiently  rapidly 
chattering  between  Ui  and  712  with  weights  (7, 1  —  7)  at  small  payoff'  loss.  Then,  let  e  — >  0. 

C.  The  Objective  Function.  At  each  time  t,  the  impatient  VM,  facing  an  interest 
rate  r  >  0,  chooses  whether  to  stop  and  earn  the  final  expected  payoff'  n{pt),  or  to  continue 
at  a  chosen  intensity  level  Ut  >  0.  Admissibility  demands  that  the  random  stopping  time 
T  and  the  level  Ut  each  be  functions  of  the  observed  history.  In  the  Optimal  Control  and 
Stopping  (OCS)  problem,  the  VA4  maximizes  his  expected  discounted  return  less  incurred 
experimentation  costs:  E[Jq  —c{nt)e'''''^dt  +  e~^'^7r{pT)\po]-  We  write  the  optimized  value, 
or  supremum  w.r.t.  T  and  (nt),  as  V{po),  since  Appendix  B.l  proves  that  the  current 
posterior  belief  po  on  state  H  is  a,  sufficient  statistic  for  observed  history  to  that  moment. 

4.  THE  OPTIMAL  LEVEL  OF  EXPERIMENTATION 

4.1  Competing  Static  and  Dynamic  Intuitions  for  the  Value  of  Information 

Let  the  convex  function  11  describe  the  expected  payoff  of  a  one-shot  Bayesian  program. 
At  the  prior  belief  p,  a  signal  yielding  the  random  posterior  belief  q  is  worth  X(g|p)  = 
E[n{q)  —  n(p)].  This  admits  a  motivational  visual  depiction.  Beliefs  being  a  martingale, 
we  have  p  =  E{q);  therefore,  we  may  tack  on  any  multiple  of  [Eq  —  p]  =  0.  As  IT  is 
convex,  it  is  differentiable  for  a.e.  p.  Put  d  =  ll'{p)  when  defined,  and  otherwise  choose 
any  subdifferential  d  G  dU{p),  i.e.  a  slope  between  the  left  and  right  derivatives.  Then 
"^{qIp)  =  E[U{q)  —  U{p)  —  d{q  —  p)]  is  the  weighted  area  between  11  and  any  supporting 
tangent  line,  with  weights  given  by  the  density  over  q  (see  Figure  1). 


®Thus,  the  cost  function  is  C^  (continuously  differentiable);  by  convexity,  it  has  a  right  derivative  c'(0+). 
'^We  thank  Paul  Milgrom  for  this  nice  insight.  We  do  not  wish  to  delve  into  a  technical  proof  of  this 
assertion,  as  it  would  detract  from  our  focus.  We  intend  the  point  and  explanation  to  be  intuitive.  ^ 


0 


p  1 


7r(p)  +  d{p  -  p) 
-*  P 


Figure  1:  Static  Value  of  Information.  With  a  continuous  signal  support,  the  shaded 
area  between  a  value  function  and  any  supporting  line,  appropriately  weighted,  is  the  value  of 
information.  With  discrete  signals,  we  instead  have  a  weighting  of  vertical  line  segments  in  this 
shaded  region  (eg.  the  thick  dashed  lines).  In  either  case,  for  a  purely  static  value  function  tt, 
the  information  value  is  maximized  in  the  middle  at  p,  and  generally  is  quasiconcave  in  p. 

For  the  specific  case  of  the  static  payoff  frontier  IT  =  tt,  this  visual  information  value  is 
maximal  at  p,  when  the  V/4  is  indifferent  between  the  two  actions.  Intuitively,  he  values 
information  most  when  he  is  most  uncertain  about  his  action  choice.  Alternatively,  the 
VA4  likes  to  spread  his  posterior  beliefs;  the  impact  of  new  information  on  the  variance 
of  posterior  beliefs  is  increasing  in  p(l  —  p),  and  peaks  at  p  =  1/2,  where  the  VM  is  most 
uncertain  as  to  which  state  is  true.  Either  static  logic  suggests  that  information  value  and 
thus  its  demand  (extrapolating  to  margins)  are  quasiconcave  'hill-shaped'  functions  of  p. 

The  above  logic  fails  for  our  impatient  VAi  in  a  dynamic  setting.  For  the  X>A^  has 
an  incentive  minimize  the  present  discounted  cost  of  information,  and  therefore  wishes  to 
delay  his  costly  high-intensity  experimentation  until  just  prior  to  stopping.  As  this  occurs 
only  at  extreme  beliefs,  information  demand  ought  to  be  greatest  for  extreme  beliefs.  Of 
course,  this  intuition  cannot  establish  the  shape  of  the  experimentation  schedule,  let  alone 
its  intertemporal  trend.  For  that,  we  must  consider  the  recursively-formulated  problem. 

4.2  The  Recursive  Formulation  and  Solution 

A.  The  Belief  Filter  and  Bayes  Problem.  Intuitively  (and  formally,  as  we  later 
show),  the  observation  process  {xt,nt)  induces  a  diffusion  belief  process  (pt).  Given  a  prior 
Po,  beliefs  (pt)  evolve  according  to  Bayes  rule  in  continuous  time.  If  C  —  "^f^/f^  denotes 
the  signal-to-noise  ratio  factor  of  (x(),  then  Theorem  9.1  of  Liptser  and  Shiryayev  (1977) 
(LS77)  asserts: 

Pt=Po  +  /o  Ps(l  -  Ps)CVn^dWs  (2) 

where  the  Wiener  increment  dWs  =  PsdW^  +  0-  —  Ps)dW^,  and  for  9  =  L,H: 

dW^  =  ^[dxf  -  [p,/x  +  (1  -  Ps){-f^)]ds]  (3) 

a 

Alternatively,  LS77's  Theorem  9.1  implies  that  (Wt)  is  a  Wiener  process  from  the  VM's 
unconditional  perspective,^  as  he  does  not  know  the  true  drift,  and  so  cannot  observe  the 


^Namely:  As  of  time  s,  given  observed  history  {xf^O  <t'<s)U  {nf,0  <t'<s),oi  more  simply,  just 
{xt,  /o  nsds)  the  increment  Wt  -  Ws  is  Gaussian  A'^[0,  t  -  s]  and  independent  of  Wg,  for  all  t  >  s. 


true  noise  process  (Wt)  that  drives  the  signal  process  in  (1).  The  precise  measure-theoretic 
statement,  in  terms  of  signal  filtrations,  appears  in  Appendix  B.l.  As  a  driftless  diffusion, 
beliefs  {pt)  are  an  unconditional  martingale,  with  least  variance  near  the  extremes  0  and  1. 
Substituting  the  ex  post  observed  history  {xt,  Ut)  into  these  formulae  reveals  how  beliefs 
are  computed.  Intuitively,  the  VM  updates  beliefs  upward  in  favor  of  /lx  >  0  {dWt  >  0)  iff 
the  observation  process  rises  faster  than  he  expects,  i.e.  iff  dxt  >  [pt/i  +  (1  —  Pt){—fJ')]dt. 

Remark.  Using  (1),  we  find  that  dW/^  =  2(^/ct)/x(1  -  pt)dt  +  dWt  and  dW^^  = 
—2{^/n't/ o) ^iptdt  +  dWt-  Plugging  either  Ito  process  (i.e.  Ito  stochastic  integral)  {Wf)  into 
the  belief  filter  (2)  yields  the  SDE  solved  by  the  conditional  belief  processes  (pf )  or  (pf ). 
Not  surprisingly,  beliefs  have  a  positive  drift  in  state  H,  and  a  negative  drift  in  state  L. 

B.  The  Value  Function.        The  supremum  value  of  the  OCS  problem  is 

y(po)  =  s^VT,(n,)  E  [/(f  -c  {nt)  e-^'dt  ^e'^'^Ti  (po  +  /o^Pt(l  -  Pt)C,^tdWt)  |  po]         (4) 

Standard  for  optimal  learning,  this  value  function  is  convex.  For  intuitively,  a  signal 
spreading  the  belief  po  =  7iPi  +72P2  to  pi  with  chance  7^  (71+72  =  1)  cannot  possibly  hurt 
the  VM.^  as  he  can  ignore  it:  If  he  optimizes  at  each  p^,  he  then  gets  7iK(pi)  +72^(P2)  > 
V{pq).  The  next  lemma,  proved  in  Appendix  B.l,  mandates  threshold  stopping  rules. 

Lemma  1  (Convexity)  Consider  any  cost  function  c{n)  >  Q.  Then  the  supremum  value 
V  is  convex  in  p.  Also,  V{p)  =  7r(p)  forp  <  p  andp  >  p,  for  some  cut-offs  0  <  p  <  p  <  1. 
//  the  null  action  is  ever  exercised,  then  V{p)  =  7r(p)  in  \p^,Po],  where  p  <  'd   <  po  <  p. 

C.  Optimality  Conditions.  The  VM.  selects  action  ^  for  p  <  p,  action  B  for 
p  >  p,  and  absent  the  null  action,  he  experiments  at  level  n{p)  for  p  G  {p,p),  an  open 
set.  If  TT  <  0  ever,  then  the  null  action  may  be  chosen  in  a  subinterval  [p.,Po]  C  {p,p), 
where  P<Pq  <Po  <P-  We  then  partition  the  OCS  problem  into  an  Optimal  Control  (OC) 
exercise  for  the  schedule  n{p)  (Appendix  B.l.c  proves  this  Markovian  form)  and  an  Optimal 
Stopping  {OS)  problem  for  the  boundaries  p,p,  and  perhaps  p^,po.  The  experimentation 
domain  is  then  £  =  {p,p),  or  £  =  {p,Pn)^{Po,P)  if  the  null  action  is  viable.  We  now  develop 
recursive  equations  for  what  we  call  the  value  v]  we  then  prove  in  Proposition  1  that  v 
exists,  and  coincides  with  the  supremum  value,  or  ?;  =  V.  Hence,  we  solve  for  the  optimal 
dynamic  policy  using  ordinary  differential  equations  methods  (ODE)  in  Proposition  2. 

Since  beliefs  (pt)  are  a  martingale  obeying  (2),  for  any  given  experimentation  region  £, 
the  Hamilton- Jacobi-Bellman  {'HJB)  equations  associated  to  the  OC  problem  are 

rv{p)  =  sup„>o{-c(n)  +  0  ■  v'{p)  +  (l/2)np2(l  -  pfev"{p)} 
=>  rv{p)  =  sup„>o  {-c{n)  +  nE(p)t;"(p)}  (5) 
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where  E(p)  =  p^(l— p)^C'^/2  measures  "belief  elasticity",  plus  the  value  matching  condition: 

v{p)  =  pn^  +  il  -pjtta         v{p)  =  P7rf  +  (1  -p)ir'^     and  possibly     v{p^)  =  v{po)  =  0  (6) 

For  any  given  control  policy  n{p),  the  Stefan  problem  (ST)  associated  to  the  OS  problem 
is:  rv{p)  =  —c{n{p))  +  n{p)E{p)v" (p) ,  plus  (6),  and  the  free  boundary  smooth  pasting  (7) 
conditions,  that  the  value  v  be  tangent  to  the  static  payoff  function  tt  at  p  and  p: 

v'{p)=T^A-^A         v'{p)  =  7v^-7r^         and  possibly         v' (p^)  =  v' {po)  =  0      (7) 

While  the  functional  problems  ST  and  TiJB  cannot  be  solved  in  closed  form,  we  can 
create  a  unified  equivalent  (second  order  nonautonomous)  ODE  problem  {SU)  below  with 
free  boundaries,  and  fully  characterize  its  solution  {v{p),p,p}  without  the  closed  form. 

Proposition  1  (Value  Existence  /  Uniqueness  /  Verification)    Assume  (ir). 

(a)  There  exists  a  unique  solution  {p,p,v)  or  {p,p,n,pQ,v)  ofUJB+ST,  namely  (5)-(7), 

with  value  w  G  C^  strictly  convex  {v"  >  0)  in  E,  and  interior  thresholds  0  <  p  <  p  <  1. 

(6)  The  solution  v  coincides  with  the  maximized  value  V  of  OCS:  objective  (4). 

(c)  The  value  v  is  jointly  increasing /decreasing /U-shaped  in  p  with  the  static  payojf  ir. 

Proof  Sketch:  The  proof  is  largely  appendicized.  Here,  we  provide  some  easier  and 
more  intuitive  insights.  For  a  fixed  domain  £,  the  FOC  for  TiJB  (5)  is  c'{n)  =  T,{p)v"{p); 
the  SOC  is  met  because  c{n)  is  strictly  convex,  and  thus  —c{n)  +  nT,{p)v"{p)  is  strictly 
concave  in  n.  The  solution  n{p)  <  oo  then  uniquely  exists  iirv{p)  =  —c{n)  +  nT,{p)v" (p)  = 
—c{n)  +  nc'{n)  —  g(n)  is  soluble  in  n.  Now,  g  is  clearly  continuous  as  c  and  c'  are;  it  is 
strictly  increasing  too  by  Claim  1  in  §A.l.  Since  c'(0+)  <  oo  if  c(n)  is  everywhere  finite  and 
convex,  then  g{n)  \.  — c(0)  as  n  4  0,  and  we  just  set  y(0)  =  — c(0)  <  0.  Thus,  ^(0)  <  rv{p) 
for  p  e  £,  because  then  v{p)  >  0.  Also,  g{n)  >  rmax(7r(0),7r(l))  >  rv{p)  for  large  n, 
because  g{n)  is  unbounded  above  by  Claim  1  in  §A.l.  Given  rv{p)  =  g{n{p)),  we  have 
n{p)  =  f{rv{jp))  for  the  strictly  increasing  inverse  /  =  g~^. 

We  have  established  that  HJB+ST  is  equivalent  to  the  two-point  boundary  value 
problem  SU:  v"  =  c'{f(rv))/E,  cum  (6)  and  (7).  Theorem  1  in  §A.2  proves  that  a  unique 
solution  to  £U  exists,  and  thereby  KJB+ST  is  uniquely  soluble  (part  (a)  here),  while 
this  solution  is  the  supremum  value  of  OCS  (part  (b)  here)  by  Theorem  3  in  §B.3. 

For  (c) ,  v  shares  the  shape  of  n  by  convexity,  value  matching,  and  smooth  pasting.   □ 

Remark.  Strict  cost  convexity  rules  out  one  undesirable  and  not  implausible  outcome: 
suddenly  exploding  the  experimentation  level  over  a  vanishing  time  interval  [0,  A],  A— >-0. 
With  a  linear  (i.e.  not  strictly  convex)  cost  function,  such  a  policy  that  quickly  achieves 
arbitrarily  perfect  information  is  preferred  with  discounting,  absent  any  cost  premium. 


Proposition  2  (Policy  Existence  /  Uniqueness)     Assume  ("A"). 

(a)  If  payoffs  7r(p)  >  0  for  all  p  G  [0, 1],  then  the  solution  {f{rv{-)),2^p]  from  HJE+ST 
IS  the  unique  optimal  policy  for  OCS. 

(b)  The  marginal  cost  ^{w)  =c'{f{w))  of  the  optimal  intensity  level  n  =  f{w)  is  increasing, 
strictly  concave,  and  differentiable  in  the  return  w  =  rv>0  —  even  if  c'  is  not  differentiable. 

(c)  IfTT^p)  =  0  at  some  p  G  [0,1],  then  the  policy  {f{rv{-)),p,p}  is  uniquely  optimal  for 
OCS  if  either  c(0)  >  0,  or  if  c{0)  =  0  but  there  exists  rj  >  0  with  hm-ujio  w^~'^^'{w)/^{w)  <  oo. 

(d)  The  level  n{p)  is  continuous  in  E;  n'(-)  exists  if  c"{-)  does;  n(-)  is  C^  when  c(-)  is  &. 

Proof  Sketch:  Part  (a)  is  proven  in  Theorems  4-5  in  §B.3,  and  (c)  in  Theorem  6  in  §B.4. 
Part  {h)  is  established  in  the  appendicized  Claims  2-4.  For  (d),  assume  that  c!'{n)  >  0 
exists,  and  hence  so  does  g'{n)  =  nc"{n)  >  0.  Since  /  is  differentiable  if  ^  =  f~^  is,  so  is 
n{p)  =  f{rv[p)).  If  c"  is  continuous,  then  so  is  g' ,  and  thus  /'  and  n'  too,  as  claimed.     D 

Any  geometric  convex  cost  function  c{n)  =  n'^  {k  >  1)  violates  the  elasticity  condition 
in  Proposition  2-c  —  met  by  exponential  functions  like  c{n)  =  e°"  —  1  (some  a  >  0). 

Remark.  We  note  after  Theorem  6of  §B.4  that  if  7r  =  0  somewhere  and  c'(0)  =  c(0)  =  0, 
so  that  the  final  proviso  in  Proposition  2(c)  fails,  the  supremum  V{p)  of  OCS  is  not 
attained;  therefore,  no  optimal  policy  exists.  This  pathological  case,  an  undesirable  by- 
product of  the  continuous  time  approximation,  arises  since  the  VM.  may  never  choose  A  (or 
the  null  action)  in  finite  time  given  the  negligible  cost  of  running  very  small  experiments. 

4.3  The  Optimal  Experimentation  Level:  The  VM  as  an  Information  Firm 

Even  though  the  FOC  c^{n)  =  T,{p)v"{p)  and  the  policy  equation  n{p)  =  f{rv{p))  are 
jointly  insoluble  in  closed  form,  the  monotonicity  of  /  allows  us  to  conclude  that: 

Lemma  2  (Monotonicity)     Given  {'k),  the  optimal  intensity  level  n{p)  is  strictly  in- 
creasing in  the  value  v{p)  for  p  e  £.  It  weakly  exceeds  f{0)  >  0,  with  /(O)  =  0  iff  c{0)  —  0. 

For  instance,  quadratic  costs  c{n)  =  r?  yields  surplus  g(n)  =  n^,  and  thus  f{n)  =  y/n. 
The  experimentation  level  is  then  an  increasing  concave  function  of  the  return  \/rv{p). 

Here  is  a  concrete  economic  intuition  for  the  monotonicity  of  f  i-^  n.  We  formally 
argue  that  the  bang  per  research  dollar  is  greater  with  a  higher  value,  and  therefore  the 
level  n  rises  with  v.  There  are  two  decisions  at  each  moment  in  time:  experiment  or  stop 
{OS),  and  if  to  experiment,  at  what  level  {OC).  Focus  first  on  the  level  choice.  Optimality 
demands  that  the  marginal  cost  c!{n)  of  information  equal  its  marginal  benefit.  Since  belief 
precision  is  linear  in  the  experimentation  level  n,  the  marginal  benefit  of  experimentation 
MB  =  T,{p)v"{p)  is  constant,  and  the  total  benefit  is  then  linear:  uMB.  So  at  an  optimum, 
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Figure  2:  Value  Function  and  Experimentation  Demand.  Overusing  the  vertical  axis, 
we  depict  botii  tlie  static  payoff  function  tt  (thick  dashed  line)  and  dynamic  value  function  v 
(solid  line),  strictly  convex  in  the  experimentation  domain  £  =  {p,p),  and  the  intensity  level  n 
(thin  dashed  line).  The  demand  is  increasing  in  v.  The  R&D  model  is  illustrated  on  the  left,  and 
a  more  general  decision  model  (no  null  action)  on  the  right.  The  option  value  of  experimentation 
—  the  vertical  distance  between  the  static  and  dynamic  values  —  is  maximized  at  the  tt  kink  p. 

g{n)  =  nc'{n)  —  c{n)  =  nMB  —  c{n)  equals  total  (flow)  benefits  less  total  (flow)  costs  of 
experimentation,  or  the  (flow)  producer  surplus  of  information.  Imagine  the  VM  as  a 
competitive  flrm,  facing  an  increasing  marginal  cost  curve,  "selling"  himself  information 
at  the  constant  "price"  'E{p)v"{p).  The  surplus  g{n)  rises  in  the  optimal  quantity  if  c  is 
strictly  convex.  Next  for  the  optimal  stopping  decision,  the  cost  of  delay  must  equal  the 
surplus  from  optimally  experimenting,  or  rv{p)  =  g{n{p)).  This  surplus  rises  in  n  with 
convex  costs.  So  a^  the  value  v  and  thus  delay  cost  rv  rises,  an  experimenting  VJ^  must 
choose  a  higher  level  n  to  generate  the  requisite  higher  surplus.  Simply  put,  the  VM.  only 
closes  down  his  informational  firm  (and  acts)  when  he  cannot  generate  sufficient  net  profits 
(producer  surplus)  to  justify  the  time  cost  of  his  capital  rental  (his  deferred  action). 

Proposition  3  (The  Optimal  Level  of  Experimentation)  Assume  the  static  payoff 
frontier  7r(p)  is  increasing  (resp.  decreasing,  U-shaped)  in  p.  Given  (ic),  the  optimal  level 
of  experimentation  n{p)  is  increasing  (resp.  decreasing,  U-shaped)  in  p  in  the  domain  £. 

For  immediate  context,  consider  our  R&D  spin.  Here,  7r(p)  strictly  increases  in  p,  and 
thus  the  research  level  rises  as  we  approach  confidence  in  the  'build'  decision  (see  Figure  2, 
left  panel).  This  provides  optimizing  foundations  for  a  commonly-observed  phenomenon: 
A  potential  uncertain  investment  has  new  life/money  breathed  into  it  by  a  key  discovery 
or  finding;  research  expenditures  stochastically  grow  over  time,  only  later  on  to  (i)  shrink 
because  of  discouragement  (eg.  cold  fusion),  or  (n)  continue  growing,  as  the  project  is 
likely  headed  for  development  (recently,  spinal  cord  research  or  a  new  shuttle  rocket  engine 
design).  A  similar  pattern  also  emerges  in  the  how  clinical  tests  of  new  drugs  proceed: 
first,  small  tests,  and  sometimes  later,  larger  tests,  and  then  quite  expensive  field  trials.^ 


^A  rare  empirical  study  of  project-level  R&D  expenditures  in  the  pharmaceutical  industry  (DiMasi, 
Grabowski,  and  Vernon  (1995))  reveals  a  pattern  strikingly  similar  to  our  theoretical  prediction.  Time- 
intensive  pre-FDA  clinical  testing  usually  occurs  over  three  sequential  conditional  phases,  of  growing  sizfe. 
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4.4  The  Option  Value  of  Experimentation 

Write  v{p)  =  7r{p)  +  I{p),  where  I{p)  is  the  expected  present  value  of  information  (or 
experimentation).  Since  learning  costs  time  and  money,  the  value  v  necessarily  inherits 
the  general  shape  of  the  static  payoffs  tt.  For  simplicity,  assume  7r(p)  >  0  always,  obviating 
a  null  action.  We  next  show  that  I{p)  peaks  at  the  kink  p  in  tt  when  the  myopically-best 
action  switches:  Experimentation  is  most  valuable  when  the  VA4  is  most  uncertain  as 
of  which  action  to  take,  as  no  action  is  dominant.  Think  of  I{p)  as  the  option  value  of 
waiting  and  choosing  an  action  after  optimally  experimenting.  This  option  to  change  one's 
plan  is  worth  the  least  when  one  is  most  sure  of  an  action  to  take.  So  the  suggestion  in 
section  4.1  of  a  hill-shaped  value  of  information  is  an  apt  description  of  this  option  value; 
however,  any  implication  that  it  alone  determines  the  experimentation  level  was  false,  for 
the  flow  return  to  experimentation  also  includes  the  terminal  payoff  n{p). 

Proposition  4  Assume  ('A')-  If  the  null  action  is  never  exercised,  then  the  option  value 
of  experimentation  I{p)  is  single-peaked  in  p,  maximized  at  p  where  tt^  and  ttb  cross.  If 
the  7iull  action  is  exercised,  then  I{p)  has  two  peaks:  one  ai  tt^  =  0  and  one  at  -kb  =  0. 

Proof  of  first  case:  By  value  matching  and  smooth  pasting  (6)-(7),  v  —  tta  rises  on  (p,  1], 
and  v  —  -Kb  falls  on  [0,p).  So  v  —  tt  is  rising  until  tx^  and  -kb  cross,  and  later  falling.       □ 

Remark.    Since  information  value  owes  to  v"  >  0,  some  have  suggested  that  the  sign 

of  v'"  should  be  relevant  for  the  experimentation  derivative.  It  is  instructive  to  see  that  it 

is  not.  Note  that  since  v"{p)  =  ^(rv(p))/E(p),  and  ^'  exists  by  Proposition  2,  v'"  exists, 

and  equals 

,         ^        rv'jp)  2(2p-l)eMp)) 

^       "^    ^^'       S(p)/(rt;(p))  ^  E(p)3/2 

Consider  the  R&D  model,  where  n'  >  0  always.  Clearly,  if  payoffs  are  such  that  p<  1/2  <p, 
then  v'"  >  0  on  (1/2,  p),  while  v'"  <  0  just  above  p,  since  v'{p)  =  0  by  smooth  pasting. 

4.5  Expected  Remaining  Time  and  Experimentation  Costs 

We  now  analyze  the  behavior  of  the  only  two  costs  of  experimentation  in  our  model: 
time  and  money.  Assume  a  stopping  time  T  <  oo  a.s.,  which  is  true  under  the  assumptions 
of  Proposition  2-a  or  c.  Then  by  §15.3  of  Karlin  and  Taylor  (1981)  (KT81),  since  (pt)  has 
zero  drift  and  variance  2E(p)n(p),  the  expected  remaining  time  t{p)  =  E[T\po  =  p]  until 
stopping  obeys  the  boundary  conditions  t{p)  =  t{p)  =  0,  as  well  as  the  ODE:  —  1  = 
0  +  E(p)n(p)r"(p).  Hence,  r"(p)<0,  and  consequently,  t{p)  is  hill-shaped. 

We  next  write  v{p)  =  R{p)  —  K{p),  or  the  expected  present  value  of  final  rewards  R{p)  = 
£'[e~'"^7r(px)|po  =  p]  less  that  of  experimentation  costs  k{p)  =  E[f^  e~'"*c(n(pi))df|po  =  p].- 
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Since  k{p)  eventually  falls  near  the  extremes  given  the  vanishing  expected  time  horizon, 
it  is  clearly  nonmonotonic  with  k{p)  =  k{p)  =  0.  By  KT81,  it  also  satisfies  the  ODE 
E(p)n(p)K"(p)  =  rK{p)  —  c{n{p)).  When  n{p)  and  thus  c{n{p))  is  everywhere  positive  and 
U-shaped,  k{p)  is  smaller  and  thus  concave  near  p  and  p.  If  k{p)  <  c{n{p))/r  always 
then  K,  is  everywhere  concave,  and  is  single-peaked.  But  if  k[p)  ever  exceeds  c{n{p))/r 
—  as  seems  quite  plausible  for  middling  p  when  n{p)  and  thus  c{n{p))  is  U-shaped  — 
then  it  must  become  convex,  and  any  crossing  is  an  inflection  point;  since  k  G  C\  it  then 
strangely  must  then  be  'molar-tooth'  shaped:  two  local  maxima,  bracketing  two  inflection 
points  at  pi  <  p2,  with  k{p)  >  c(n{p))/r  on  {pi,p2)  C  (p,p).  Which  scenario  arises  is  an 
open  question.  Quite  plausibly,  k  is  molar-tooth  shaped  only  for  low  enough  r  (so  that 
experimentation  lasts  a  long  time),  and  disparate  payoffs  (extremely  U-shaped  costs). 

4.6  Experimentation  Drift 

Since  the  belief  process  {pt)  is  a  martingale  diffusion,  and  v  e  &,  values  {v{pt))  are  an 
Ito  process  by  Ito's  Lemma,  and  also  a  strict  submartingale  (drifts  up)  in  £,  since  v"  >  0 
in  £.  Likewise,  (r(pt))  is  a  strict  supermartingale  Ito  process  (drifts  down),  as  r"  <  0 
in  £.  Finally,  {K,{pt))  is  everywhere  a  strict  submartingale  Ito  process  if  k  is  hill-shaped, 
and  otherwise,  it  must  switch  to  a  strict  supermartingale  inside  {pi,P2)- 

By  the  same  reasoning,  the  experimentation  level  process  {n{pt))  is  a  submartingale  if 
n{p)  is  convex.  But  n  may  well  be  concave  in  rv,  as  when  c{n)  =  n^.  Since  v  is  strictly 
convex  in  p  e  £  by  Proposition  1,  the  convexity  of  n{p)  =  f{rv{p))  is  then  unclear.  But  if 
the  producer  surplus  g  is  weakly  concave  and  increasing  in  n,  then  its  inverse  /  is  weakly 
convex  and  increasing  in  n.  By  Theorem  5.1  of  Rockafellar  (1970),  the  composition  f{rv) 
of  a  convex  and  increasing  function  /  with  a  strictly  convex  function  rv  is  strictly  convex. 

A  more  refined  statement  is  possible  when  c"'  exists.  Indeed,  differentiate  the  Bellman 
equation,  that  surplus  equals  the  delay  cost,  g{n{p))  =  rv{p),  to  get  g'{n)n'{p)  =  rv'{p). 
Differentiating  once  more,  and  applying  the  optimal  control  FOG  v"{p)  —  c'{n{p))/T,{p), 
yields  a  simple  nonautonomous  second  order  differential  equation  in  the  level  n  alone: ^'^ 

[nc"{n)]n"  +  [nc"  {n)]' (n'f  =  [g' {n)n' (p)]'  =  rv"{p)  =  rc'{n)/E{p)  (8) 

Since  c'  >  0,  (8)  implies  n"  >  0  at  least  when  [nc"{n)]'  <  0  —  i.e.,  when  the  information 
producer  surplus  is  itself  concave  in  the  experimentation  level,  as  already  asserted. 

Proposition  5  Assume  {i^) .  If  the  producer  surplus  g{n)  —  nc'{n)  —  c{n)  is  concave,  then 
the  intensity  level  n{p)  is  strictly  convex  in  beliefs pE  8.,  and  thus  {n{pt))  is  a  submartingale. 

'°This  more  simply  yields  the  nonautonomous  second  order  differentiaJ  equation  in  the  value  v  alone: 
c'{f{rv{p)))  =  T,(j})v"(p).  Equation  (8)  does  not  necessarily  hold  even  though  v"  exists  since  the  compo- 
sition g{n(p))  may  be  twice  differentiable  even  if  we  cannot  write  its  first  derivative  as  g' {n{p))n' {p) .      " 

13 


Figure  3:  A  Payoff  Shift  in  the  R&D  Model.  When  the  (unplotted)  bad  build  payoff  £ 
rises,  so  does  the  value  function,  and  intensity  levels  (inside  the  now  left-shifted  interval  {p,p)). 

For  instance,  any  convex  geometric  cost  function  c{n)  =  vf'  {k>l)  yields  a  convex  producer 
surplus  [nd'{n)\'  =  k{k  —  l)^n^~^  >  0.  The  knife-edge  case  c{n)  =  1  +  nlogn  for  n  >  1 
{c{n)  =  n  for  n  <  1)  yields  nc"{n)  =  1  constant,  and  so  linear  surplus.  Note  that  a  concave 
surplus  function  g{n)  is  sufficient  that  intensity  levels  (n^)  be  a  submartingale.  In  light 
of  (8),  for  large  enough  v",  we  may  have  still  have  n"  >  0  even  if  g{n)  is  slightly  convex. 

Remark.  If  [nc"{n)]'  vanishes  in  (8),  then  n  is  locally  convex.  In  particular,  for  the 
R&D  model,  if  Iimji|o"c"'(n)  =  0  and  c"(0)  <  oo,  then  one  can  show  by  THopital's  rule 
that  n"{p)  >  0  in  a  neighborhood  \p,p  +  e)  for  some  e  >  0;  therefore,  the  experimentation 
level  (n(pt))  is  locally  a  submartingale.  In  particular,  if  c{n)  =  n*^,  then  lim„j,oc"(n)  = 
lim„4.o '^c"'(n)  =  lininio  0{n''~^)  =  0  when  k  >  2.  So  for  barely  profitable  R&D  projects 
and  not  too  convex  costs,  the  experimentation  level  is  at  least  initially  expected  to  rise. 

5.  SENSITIVITY  ANALYSIS 

5.1  Parametric  Shifts 

We  now  explore  how  the  experimentation  schedule  moves  with  changes  in  the  payoffs, 
information  cost,  or  interest  rate.  Appendix  C  develops  a  method  of  tangents,  which  we 
brutally  illustrate  in  Figure  3  for  the  R&D  model.  When  i  rises,  so  does  v,  and  hence  n. 
But  the  tangent  line  p/i  +  (1  —p)i  tilts  upward  too,  and  thus  the  thresholds  p  and  p  must 
fall,  since  (rather  loosely)  the  value  function  is  less  curved.  This  is  intuitive:  When  the 
reward  is  higher,  one  is  indifferent  about  adopting  or  quitting  when  slightly  less  optimistic, 
so  that  p  or  p  both  shift  in  the  same  direction  (but  up,  if  h  had  fallen). 

Since  the  sole  reason  to  pay  for  information  is  uncertainty  over  the  state  of  the  world, 
a  most  natural  thought  experiment  stems  from  raising  the  payoff  risk.  In  other  words, 
assume  that  h  rises,  and  £  falls  so  as  to  maintain  a  constant  expected  payoff  nsip)  from 
that  action  at  the  current  belief  p.  Intuitively,  this  ought  to  raise  the  value  of  the  dynamic 
problem,  since  the  VM  should  prefer  a  riskier  final  payoff  distribution.  For  the  static  payoff 
frontier  in  the  current  stopping  set  goes  up.   Then  by  employing  the  same  level  decision. 
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and  stopping  rule,  the  VAi's  current  value  intuitively  increases.  By  re-optimizing,  he  does 
no  worse.  Hence  the  return  rv  and  thus  the  experimentation  level  n  both  rise. 

The  T>A4  also  benefits  from  less  convex  information  costs,  and  from  more  powerful 
signals  (higher  Q.  Since  the  value  is  greater,  so  is  the  intensity  level.  Finally,  with  a 
higher  interest  rate  r,  the  VM.  enjoys  a  higher  expected  payoff  (since  discounting  is  a  key 
cost),  and  is  more  eager  to  stop  and  act:  Thresholds  both  shift  in.  Less  obviously,  the 
value  often  falls  proportionately  less  than  the  interest  rate  rises,  so  that  the  return  rv,  and 
thus  the  intensity  level  n,  rises. 

To  avoid  a  case-by-case  analysis  that  adds  little  to  the  general  story,  the  portmanteau 
result  that  follows  (proved  in  Appendix  C.l)  ignores  complications  due  to  the  null  action. 

Proposition  6    Assume  (ic),  and  payoffs  yrf  so  high  that  the  null  action  is  never  taken. 

(a)  Payoff  Levels:  The  value  v{p)  and  experimentation  level  n{p)  shift  up  for  p  G  [p,p] 
when  any  payoff  TT^  rises.  Thresholds  p,p  rise  ifn^  or-K^  rises,  and  fall  if'K^  or  n^  rises. 

(b)  Payoff  Risk:  If  payoffs  grow  riskier  (the  expected  payoff  iTaip)  remains  constant  for 
an  action  a  at  belief  p,  but  the  payoff  spread  l^^a  ~'^a\  increases),  then  the  value  v{p)  rises, 
the  thresholds  "shift  out"  (p  falls  and  p  rises),  and  the  experimentation  level  n{p)  rises. 

(c)  Cost  Convexity:  Assume  that  the  cost  function  grows  "more  convex"  —  namely, 
c{n)  is  replaced  by  c{n),  where  c(0)  =  c(0),  c{n)  —  c{n)  convex,  and  corresponding  slope 
of  (,  =  c'{f)  (the  marginal  costs  at  the  optimum)  at  0  is  higher:  ^'(0+)  >  ^'(0+).  Then 
thresholds  shift  in,  and  the  value  v{p)  and  intensity  level  n{p)  uniformly  fall  in  £.  With 
no  fixed  costs  c(0)  =  c(0)  =  0,  this  is  true  if  c{n)  —  c{n)  is  convex  and  c'(O)  >  c'(0)  >  0. 

(d)  Information  Quality:  As  the  signal-to-noise  ratio  f actor  C,  rises,  the  value  v{p)  and 
the  experimentation  level  n{p)  shift  up,  while  the  thresholds  shift  out. 

(e)  Impatience:  As  the  interest  rate  r  rises,  the  value  v{p)  falls,  and  thresholds  shift  out. 
Also,  the  optimal  intensity  level  n{p)  rises  strictly  near  one  or  both  thresholds  in  £..  In  the 
R&D  model,  n{p)  declines  for  all  p  <  p',  and  rises  for  all  p  >  p',  for  some  p'  G  {p,p)- 

Surprisingly,  the  impatience  result  runs  counter  to  the  folk  wisdom  of  Bayesian  learning. 
More  impatient  decision  makers  typically  'experiment'  less  (eg.  in  the  canonical  settings  of 
§2.B),  with  more  myopic  actions.  In  our  model,  greater  impatience  raises  the  VM's  delay 
cost,  and  somewhere  induces  him  to  accelerate  his  experimentation.  Further,  as  Proposi- 
tion 7  will  assert,  as  r  blows  up,  the  VM  experiments  at  an  exploding  rate  —  albeit  over 
a  vanishing  belief  interval.  The  'folk  intuition'  depends  on  never-ending  experimentation, 
whereas  experimentation  has  a  finite  purpose  here,  that  one  eventually  stops  and  acts. 
Information  accrual  here  is  a  means  to  an  end;  it  is  not  a  payoff-generating  lifestyle. 

Our  analysis  in  the  paper  so  far  remains  valid  if  the  final  payoff  is  an  annuity  — 
i.e.  where  '!t{pt)  is  an  eternal  flow  payoff  rather  .than  a  one-shot  lump-sum,  as  we  have 
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assumed.  What  happens  in  this  case  is  that  the  final  decision  is  formally  very  much  like 
the  safe  uninformative  arm  in  a  bandit  model:  It  provides  a  constant  flow  current  payoff', 
and  is  therefore  exercised  when  equal  to  the  current  value.  So  with  a  higher  interest 
rate  r,  the  annuity  is  worth  less,  and  the  intensity  level  falls  everjrwhere.^^  Indeed,  the 
Bellman  equation  for  the  maximization  E[J^  —c{nt)e~^*'dt  +  e~'"*7r(pr)/r]  is  still  (5).  But 
in  terms  of  the  return  w  =  rv,  it  becomes  w{p)  =  max„>o{— c(n)  +  nE(p)to"(p)/r}.  Hence, 
n{p)  =  f{w{p)),  and  w  solves  w"{p)  =  rc'{f{w{p)))/Yj{p).  The  value  matching  and  smooth 
pasting  conditions  are  still  w{p)  =n{p),w{p)  =  7r{p),w'{p)  =  n'{p),w'{p)  =  7r'{p).  A  higher 
interest  rate  r  is  then  formally  equivalent  to  a  lower  signal-to-noise  ratio  factor  (^;  thus, 
this  diminishes  the  return  w{p),  and  n{p)  =  f{w{p)),  by  the  logic  of  Proposition  6. 

5.2  The  Return  to  Wald's  World 

We  finally  turn  full  circle,  recalling  our  two  paired  twists  on  Wald's  sequential  paradigm: 
impatience  and  strict  cost  convexity.  As  noted,  these  assumptions  cut  in  opposite  ways. 
Absent  payoff  discounting,  the  VJ^  sees  no  hurry  to  stack  experiments,  and  reverts  to  a 
purely  sequential  mode  (vanishing  intensity  levels)  barring  any  fixed  flow  experimentation 
cost  c(0)  >  0.  Without  strict  cost  convexity,  the  VM.  faces  no  parallel  experimentation 
penalty,  and  he  becomes  a  classical  statistician,  running  a  massive  experiment  at  time-0. 

Proposition  7    Assume  the  final  payoff  7r{p)  >  0  everywhere. 

(a)  Vanishing  Impatience:  For  fixed  c{n)  obeying  {"k),  the  intensity  level  n{p)  explodes 
(where  >  Q)  asr  ^  oo,  and  decreases  to  f{0)  >0  as  r  4-  0.  Thus,  n(p)4,0  as  rlO  iffc{0)  =  0. 
{b)  Vanishing  Convexity:  Fix  r  >  0.  Consider  a  cost  function  sequence  Ci(n),  C2(n), . . . 
obeying  ("Ar).  Let  A^,  A^  be  the  lower  and  upper  Lipschitz  constants  of  the  marginal  cost 
c'f.  :  Afc(n2  — ni)  <  cj(.(n2)  — cj(.(ni)  <  Afc(n2  — ni)  for  all  n2  >  ni  >  0.  Thennk{p)  uniformly 
explodes  (where  positive)  i/limfc_>oo  A^  =  0,  and  uniformly  vanishes  i/limjt^oo  At  =  oo. 

Proof:    To  see  the  impatience  limits,  consider  that  since  v{p)  <  max{7r^,  tt^,  tt^,  tt^},  the 

return  rv{p)  vanishes  as  r  ->  0.    Because  ^(0)  =  -Cq,  so  that  /(-cq)  =  0,  the  optimal 

intensity  level  must  satisfy  n{p)  >  /(O)  >  0  for  all  p  e  £,  and  so  n{p)  tends  down  to  /(O). 

Likewise,  since  v{p)  >  7r(p)  >  0  for  p  e  £,  rv{p)  and  n(p)  explode  as  r  — ^  oo.  D 

Proof  of  the  convexity  limit  is  appendicized,  but  for  a  powerful  example,  consider 
c(n)  —  n*^  for  k  >  1\  the  producer  surplus  is  g{n)  =  nc'{n)  —  c{n)  =  {k  —  l)n^.  Its  inverse 
/(n)  =  [n/{k  —  \)\-l^  blows  up  as  A;  4- 1,  as  we  converge  upon  Wald's  case  of  linear  costs. 
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We  are  very  grateful  to  Sven  Rady  for  discovering  this  key  difference. 
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6.   CONCLUSION 

Summary.  This  paper  has  explored  the  basic  two  action,  two  state  decision  problem 
under  uncertainty.  We  have  modified  Wald's  model  by  assuming  an  impatient  VA4,  and 
particularized  it  to  the  case  of  an  increasing  and  strictly  convex  cost  function  of  within- 
period  experimentation.  These  two  plausible  economic  assumptions  have  jointly  afforded 
a  reasonably  complete  Bayesian  characterization  of  information  demand:  Research  levels 
grow  with  the  optimism  over  the  project.  For  not  too  convex  cost  functions,  they  drift 
upwards  over  time.  This  yields  some  simple  falsifiable  implications  in  an  R&D  context. 

Our  conclusions  have  not  come  in  the  standard  discrete  time  setting,  but  in  a  new 
continuous  time  control  of  variance  for  a  diffusion.  We  consider  this  framework  a  key 
contribution  of  the  paper,  as  it  is  a  tractable  new  off-the-shelf  decision  model  of  information 
purchases  or  R&D.  We  believe  that  it  can  be  applied  in  many  settings,  such  as  strategic 
patent  races,  general  equilibrium  R&D  models,  or  principal-agent  experimentation  models. 

Robustness.  While  we  have  studied  a  model  with  just  two  actions  and  two  states, 
our  main  intensity- value  monotonicity  result  n  =  f{rv),  and  the  associated  information 
producer  surplus  intuition  extends  to  finitely  many  actions  and  states.  It  also  obtains  in 
a  normal  learning  model  with  state  space  6  =  E;  however,  the  resulting  problem  is  not 
stationary  in  the  posterior  mean  alone,  as  total  research  outlays  so  far  increase  one's  belief 
precision.  Such  a  V/A  eventually  becomes  convinced  of  anything  he  learns,  and  quits. 

An  Extension.  It  is  surprising  that  first  order  conditions  nailed  down  the  optimal  ex- 
perimentation level  in  Proposition  1.  Apart  from  normal  learning  models,  such  regularity 
properties  on  the  demand  for  information  are  very  rare.  For  instance,  Radner  and  Stiglitz 
(1984)  have  pointed  out  that  with  a  one-shot  information  purchase,  a  'non-concavity'  in 
the  value  of  information  may  emerge  —  its  marginal  value  is  initially  zero.  In  a  work  in 
progress,  we  argue  that  when  information  can  be  repeatedly  purchased,  and  its  value  is 
endogenous,  a  well-behaved  dynamic  demand  theory  emerges.  The  resulting  value  of  in- 
formation is  concave  in  the  quantity  (and  linear  in  our  continuous  time  limit,  underscored 
on  page  10),  as  economists  prefer.  Information  is  quite  unlike  other  goods,  being  only 
economically  well-behaved  if  the  decision  maker  can  exercise  an  eternal  option  of  repur- 
chase. This  paper  might  be  seen  as  an  application  of  this  general  principle  to  a  somewhat 
historically  important  Bayesian  decision  problem. 

Finally,  we  also  believe  that  this  same  framework  will  afford  a  dynamic  extension  of 
Blackwell's  (1953)  theorem  on  the  value  of  information  with  a  much  finer  ordering.  For 
large  purchases  of  very  weak  and  cheap  signals,  all  the  DM  cares  about  is  the  signal-to- 
noise  ratio.  A  careful  formulation  and  development  of  this  idea  awaits  our  future  work. 
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APPENDICES 

A.  PRIMARY  MATHEMATICAL  RESULTS 

A.l  Properties  Related  to  the  Cost  Function 

Claim  1  The  surplus  g{n)  is  continuous,  increasing,  and  unbounded  given  (-Ar).  Since 
y(0)<0  (=Q  iff  c{0)  —  0),  its  inverse  f  =  g^^  is  continuous  and  increasing,  with  /(O)  >  0. 
Proof:    g{n  +  e)  —  g{n)    =    [c{n)  —  c{n  +  e)]  +  {n  +  e)c'{n  + e)  —  nc'{n) 

>  —Ec'{n  +  e)  +  {n  +  e)c'{n  +  e)  —  nc'{n) 

>  nc'{n  +  £)  -  nc'{n)  >  nXe  We  >  0       O 

Claim  2  Given  (i^),  the  marginal  cost  of  experimentation  ^{w)  =  c'{f{w))  is  increasing, 
strictly  concave,  and  Lipschitz  (i.e.  locally,  not  globally  so)  in  the  return  w  =  rv,  forw>0. 

Proof:  Clearly,  ^(0)  >  0  and  ^  >  0  on  (0,oo).  Since  g{n)  =  nc'{n)  —  c{n)  is  con- 
tinuous and  strictly  increasing  by  Claim  1,  its  inverse  /  exists  and  is  continuous.  Let 
c*{n*)  =  sup„7i*n  —  c{n)  be  the  (Legendre-Fenchel)  conjugate  dual  of  the  cost  func- 
tion. By  Theorems  12.2  and  26.3  in  Rockafellar  (1970)  (R70),  c*  is  convex  since  c  is 
convex,  and  strictly  convex  as  its  dual  c**  =  c  is  smooth.  His  Theorem  23.5-d  then  yields 
c*{n*)  +  c{n)  =  n*n  for  any  subgradient  n*  of  c  at  n  (the  Fenchel- Young  equality).  Taking 
n*=c'{n),  we  have  c*{c'{n))  +  c{n)  =  nc'{n),  or  c*{c'{n))  =  g{n).  Finally,  at  n  =  f{w),  we 
get  c*{^{w))  =  c*{(:^{f{w)))=g{f{w))  =  w.  The  inverse  relationship  ^  =  (c*)~^  obtains,  and 
since  c*  is  increasing  and  convex,  ^  is  increasing  and  concave.  Finally,  ^  is  locally  Lipschitz 
away  from  w  =  0,  on  the  relative  interior  of  its  domain,  by  Theorem  10.4  of  R70.  D 

Observe  that  even  though  the  marginal  cost  function  c'{n)  need  not  be  concave  in  n, 
the  function  ^{w)  =  d{f{w))  is  surprisingly  concave  in  w.  When  c"(>  0)  exists,  this 
is  easily  seen:  For  then  f'{w)  =  [g'{f{w))]~^  =  [f{w)d'{f{w)]~^  >  0,  and  consequently, 
C{w)  =  d'{f{w))f'{w)  =  l/f{w)  >  0  and  e."{w)  =  -f'{w)/f\w)  <  0. 

Clciim  3     Given  (tAt),  both  the  inverse  surplus  function  f  and  y/J  are  Lipschitz  on  (0,  oo). 

Proof:  If  false,  the  Lipschitz  inequality  for  /  fails  for  wi  >  wq  >  0  close.  By  the  relations 
w  =  g{f{vj))=g{y)=yd{y)  —  c{y),  plus  the  Lipschitz-down  constant  A  for  cf{n), 

9{fM)-g{f{wo))  cjfiw,))  -  cifjwo))  .c'ifjw,))  -  d{f{wo)) 

/W-/M     -^(^(-°))-     /(.,)- /M     ^^^"^^     /(-0-/K)        ^'^ 

where  the  limit  exists  as  c'  exists  and  /  is  continuous,  and  is  approached  from  below  by 
concavity  of  ^  =  cf{f)  (see  Claim  2).  The  Lipschitz  constant  l/\f{wo)  G  (0,oo)  suffices,    u 
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For  \/] ,  the  LHS  of  (9)  is  divided  by  ^/J{w^ ~ ^/JJwo) .  Expressing  this  on  the  RHS  as 
[/(w'l)  -  f{wo)]/[\/fiwi)  +  ^/f{woj],  this  yields  the  Lipschitz  constant  l/2A(/(u;o))^/^  □ 

Claim  4  Given  (i^),  the  function  ^{w)  is  differentiable  for  all  w  >  0 ,  with  ^' [w)  =  1  /  f  [w) . 
Proof:  Since  ^  is  concave  on  (0,oo),  the  right  derivative  ^'{w+)  exists  by  Theorem  24.1 
of  R70.  Let  ^  e(y)  -  ^H  _  J_ 

y-w  f{w)' 

Clearly,  limj^j,^,  D{y)  —  C{w+)  —  l/f{w)  exists.  To  see  why  ^'{w+)  =  1/ f{w),  observe  that 
using  g{n)  =  nc'{n)  —  c{n)  and  g{f{w))  =  w  (as  well  as  algebraic  simplification),  we  have 
the  identity 


1     f(y)  -  fiw) 
f(w)       y-w 


=(^(^»-'(^W'-c'(/fa)) 


S(v)  -  /(<") 

Given  /  continuous,  and  c  differentiable,  the  [..]  term  vanishes  as  y  —  w\.Q.  We  only  need 
1//  bounded  (/>0  by  Claim  1)  and  /  Lipschitz  (Claim  3).  Similarly,  e,'{w-)  =  l/ f{w).  D 

Claim  5     Given  ("A"),  ^(p,  v)  =  ^{rv)/T,{p)  is  continuous  and  Lipschitz  on  (0,1)  x  (0,  oo). 

Proof:  By  Claim  4,  ^'  exists,  so  that  ^  is  partially  differentiable  in  v,  with  d'^/dv  — 
r^'{rv)/E{p)  =  r/[f{rv)E{p)]  G  (0,  oo)  for  {p,v)  e  (0,1)  x  (0,oo).  Similarly,  d^/dp  = 
4^(rw)(2p  -  1)/(CV(1  -  Pf)  e  (-00,  oo)  for  (p,  v)  €  (0, 1)  x  (0,  oo).  D 

A. 2  Two- Point  Boundary  Value  Problems 

a.    An  ODE  Problem  with  Value  Matching  £md  Smooth  Pasting.        Let  £U 

denote  the  ff.  existence  /  uniqueness  problem:  For  any  strictly  convex  cost  function  cE  & 
with  c(0),c'(0)  >  0,  there  exist  unique  p,p  with  0  <  p  <  p  <  1  and  a  C^  function  v 
with  v"{p)  =  ^(p,u)  =  c'{f{rv{p)))/E{p)  =  ^(r?;(p))/E(p),  such  that  (6)-(7)  hold  (no 
null  action).  With  value  matching  and  smooth  pasting,  this  is  not  a  standard,  two-point 
boundary  value  ODE,  amenable  to  known  free  boundary  results.  It  needs  an  ad  hoc  proof. 

Observe  that  p  >  p  in  EU,  where  tt  is  kinked  at  p,  since  v"  >  0  (given  c'  >  0)  and 
0  <p<p<  1.  Towards  solving  £U,  we  first  consider  a  second  order  one  point  boundary 
value  problem  XV(p)  of  the  Cauchy  type:  For  any  fixed  pG  {p,  1),  solve  Vp{p)  =  ^(p,  Vp{p)) 
given  Vp{p)  =  p7rf  +  (1  -  p)7r^  >  0  and  v'p{p)  =  7rf  -  tt^  >  0.  Clearly,  any  such  Vp  is  6^. 

We  now  start  our  attack  on  the  SU  problem,  by  first  assuming  the  R&D  payoffs. 

Claim  6  (a)  For  all  p  G  (p,  1),  XV(p)  has  a  unique  solution  Vp,  that  can  be  continued  on 
(0, 1)  X  (0,  oo),  with  Vp  and  vUp)  uniformly  continuous  in  p. 

(6)  For  any  p  G  (p,  1),  Vp  is  positive,  strictly  convex,  and  Lipschitz  either  (z)  on  (0,p),  or 
(zi)  on  (po,p],  forpo>0,  with  Vp{po)=limpi,poVp{p)  =  0.  ^ 
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Proof  of  (a):  All  claims  follow  given  ^  continuous  and  Lipschitz  on  (0,1)  x  (0,  oo)  (see 
Claim  5),  after  reducing  the  second  order  ODE  to  a  two-dimensional  system  of  first-order 
ODEs.  See  Theorems  1.1-2  and  2.1,  and  Observation  2  in  §1.6  of  Elsgolts  (1970). 

Proof  of  (6):  Either  Vp  >  0,  and  Vp  is  Lipschitz  on  (0, 1)  as  it  continues  left,  or  Vp  Lipschitz 
(valid  on  an  open  set)  fails  at  some  supremum  po  £  (0,  p)-  If  so,  by  Theorem  1-4.1  of  Brock 
and  Malliaris  (1989),  the  domain  of  Vp  can  be  compactified  via  Vp{po)  ~  lin^p^po  ^p(p)  =  0 
—  because  ^  is  Lipschitz  when  Vp{p)  >  0,  and  ^  is  continuous  and  bounded  in  (0, 1)  x 
[0,p7rf  -I-  (1  —  p)7rg].  Strict  convexity  follows  from  Vp  =  ^{p,v{p))  >  0  given  Vp{p)  >  0. 

Claim  7  (Limit  Solution  Behavior)     (a)  For  any p  >  p  that  is  close  enough  top,  there 
exists  Po  >  0  such  that  Vp{po)  =  limpip^  ^p(p)  =  0,  with  Vp  >  0  and  v'p  >  0  on  {po,p]- 
(b)   There  is  a  unique  p  €  (p,  1),  such  that  for  all  q  E  (p,  1),  o,  unique,  strictly  positive 
solution  Vg  exists  on  (0,  q].  Also,  limpj,o  f^^(p)  =  ~oo  for  all  q  >  P- 

Proof  of  (a):  If  Up  >  0  on  (0,p]  for  all  p  near  p,  then  for  p  just  below  p,  by  a  Taylor 
expansion  _  _      _  _  u        r     - 

Mp)  ~  MP)  -  Mp~)(p  -p)  ^  Mp)  -  i'^B  -  t^b){p  -  P) 

Since  Vp{p)  =  jott^  +  (1  —  P)t^b  ^  P'^b  +  (1  ~  p)^b  =  0  as  p  4-  Pi  continuity,  the  limit  in 
Claim  6-6,  case  (m),  and  tt^  —  tt^  >  0  force  Vp{p)  <  0  for  p  near  p  —  a  contradiction. 

Next  let  Po  be  the  supremum  (and  largest,  by  continuity)  point  where  t'p(po)  =  0.  We 
claim  that  t'^(po)  >  0.  If  not,  then  Vp{pQ-\-£)  <  0  for  small  £  >  0.  Since  Vp{p)  =  tt^  — tt^  >  0 
by  construction,  Vp  crosses  the  p  axis  at  some  qo  G  (po,p),  with  non-negative  slope.  But 
then  Vp{qo)  =  0  <  v'p{qo),  and  qo  >  po  has  all  the  properties  of  po  —  contrary  to  po 
maximal.  Finally,  since  v"{p)  =  ^(p,  t;)  >  0,  the  slope  f^  >  0  on  (po,p],  as  claimed.  □ 
Proof  of  (b):  Consider  any  p  >  max  (0,  po),  i.e.  wherever  the  solution  Vg{p)  uniquely  exists. 

V'gip)    =^"b-^'b-   f  nZ,  Vg{z))dZ   =   ^f    -    ^^    -    ^  I'     ^Jli-%^' 

Because  Vg{z)  >  e  >  0  for  all  z  €  \p,q]  C  (0,1),  this  last  integral  strictly  exceeds 
^{re)  f^  2~^(1  —  zy^dz,  and  thus  blows  up  either  as  q  "[  1,  or  as  p  i  0  for:  any  q  >  p.  So 
limg-t-i  v'g{p)  =  — oo  for  fixed  p,  and  limp4,o  w^(p)  =  — oo  for  q  >  p.  Since  v'^{q)  =  tt^ -tt^  >  0, 
Vg{p)  >  0  is  strictly  convex  and  U-shaped  in  {0,q]  for  q  close  enough  to  1. 

Claim  8  (Monotonicity  Properties)     The  solution  Vp  rises  and  flattens  asp  rises:  For 
all  threshold  pairs  pi  <  p2  m  (p,  1),  we  have  (a)  Vp^  <  Vp^,  and  (6)  v'p^  >  Vp^. 
Proof:    Suppose  (a)  fails  at  some  p'   <  pi  —  say,  the  largest  p'  where  vp^,Vp^  cross: 
Aw(p')  =  Vp^ip')  -  Vp^ip')  =  0.  Because  A?;  <  0  on  (p',  1),  this  requires  Av'{p')  <  0.  Since 
P2>pi,  strict  convexity  of  t;^^  in  (pi,p2)  yields  v'p^{pi)  =  7r^  -  7r^  =  v'p^{p2)>v'p^{pi),  and  so 
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Av'{pi)  >  0.  By  continuity  of  Av',  Av'{p")  =  0  for  some  p"  e  (p',Pi),  i.e.  v'p^{p")  =  v'p^{p"), 
or 

4M)  -  ff  'i'iz,VpA^))dz  =  v'p,{p2)  -  ff  ^{z,vp,{z))dz  (10) 

Cancelling  f^j(pi)  =  tt^  —  tt^  =  I'^^fe)  from  both  sides  of  (10),  the  integrals  on  either  side 
of  (10)  are  equal.  But  this  is  impossible,  because  pi  <  p2,  and  '^{z,Vp^{z))  <  ^(z,  ^^^(-z)) 
since  Vp^{z)  <  Vp^{z)  given  Ai'(z)  <  0  for  2  >  p"  >  p'. 

Finally,  part  (b)  follows  from  v'p{p)  =  (tt^  —  tt^)  —  J^  ^(2,  rvp{z))dz.  For  the  domain  of 
integration  rises  with  p,  and  thus  so  does  the  integral,  given  ^  >  0.  And  we've  just  shown 
that  Vp{z),  and  thus  ^(z,  Vp{z))  rises  uniformly  for  z  in  the  domain  of  the  solution.         D 

Theorem  1  {SU  Existence/Uniqueness)  Given  (ir),  a  unique  p<l  exists  s.t.  IV{p) 
has  a  unique  solution  Vp  for  a  unique  p>0  obeying  the  boundary  conditions  (6),  (7)  of  £U. 

Proof:  Choose  q  >  p  close  enough  to  1  that  Vq{p)  uniquely  exists  and  is  strictly  positive 
on  (0,^].  As  q  decreases,  Vg{p)  uniformly  shifts  down  by  Claim  8.  But  as  long  as  1;^  >  0 
over  (0,^],  it  is  U-shaped  by  Claim  7-6,  with  ^^(0+)  =  —00  <  0  <  tt§  —  tt^  =  Vq{q)-  Since 
it  is  also  strictly  convex  by  Claim  6-b,  with  exactly  one  global  minimum  in  (0,  g),  it  can 
first  cross  the  horizontal  axis  only  in  the  way  required  by  £U.  The  assertion  then  fails 
only  if  Vg  always  stays  strictly  positive  for  all  q  —  which  we  ruled  out  in  Claim  7-a. 

Finally,  the  above  argument  only  made  essential  use  of  the  rising  neip)  curve.  A  falling 
tta{p)  line  could  easily  substitute  for  the  horizontal  axis,  which  was  thus  WLOG.  D 

b.  An  ODE  Problem  with  Value  Matching  Alone. 

Claim  9  Fix  p'  <  p  <  p' .  The  two-point  boundary  value  problem  v"{p)  =  ^{p,rv{p))  s.t. 
v{p')  =  7r(p')  and  v{p')  —  7r{p')  has  a  solution  v. 

Proof:  Consider  the  Cauchy  problem  v"{p)  =  'if{p.,rvs{p))  s.t.  Vs{p')  =  tt{p')>0,  v'g{p')  —  s, 
for  some  (slope)  s  G  R.  By  the  properties  of  ^  in  Claim  5,  a  unique  solution  Vg  to  this 
problem  exists  locally  for  each  s  G  R,  is  uniformly  continuous  in  s,  and  can  be  extended 
left  as  long  as  it  remains  positive:  Vs{p)  =  n{p')  +  s{p  —  p)  -\-  J-,  f^,  ^(y,  Vs{y))dxdy.  Next, 

VsM  -  VsM  =  (S2  -  si)(p  -  p')  +  /;  J^,[-^{y,Vs,{y))  -  ^{y,VsM)]dxdy  <  0       (11) 

for  every  S2  >  Si,  and  all  p,  where  both  solutions  Vs^,Vs2  exist.  This  obtains  because  0  = 
^S2 iP')  ~^si{p')  <  S2  —  S1  =  v'g^ {p')  —  v'g^ {p') .  Inequality  then  holds  by  continuity  for  p  near 
p';  and  so  ^(p, i^s2(p))  <  *(P;^5i(p))  by  monotonicity  of  ^;  continuing  leftward,  Vs2  <  Vsi- 
Hence,  lims-^±ooVs{p)  exists  for  all  p  <  p'  provided  Vs{p)  >  0.  In  fact,  \ims-^-ooVs{p)  —  00 
because  U52(p)  —  t^s(p)  <  (s2  — •s)(p— p')  from  (11),  provided  Vs,Vs2  >  0.  Likewise,  Vs{p)  =  0 
for  large  enough  s.  Therefore  by  continuity  and  monotonicity  of  Vs{-)  in  s,  for  all  p  <  p 
there  exists  s  with  Vs{p)  =  7r(p)  G  [0,  00),  and  in  particular  for  p  =  p'.  □ 
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B.  DERIVATION  OF  OPTIMAL  CONTROL  FOUNDATIONS 

This  appendix  rigorously  formulates  the  control/stopping  problem  and  the  sequence 
of  proofs  that  establish  the  existence  and  uniqueness  properties  of  its  solution.  We  first 
summarize  our  plan  of  attack,  and  highlight  the  absence  of  any  logical  circularity.  We 
ignore  the  null  action  throughout,  as  its  additional  difficulties  (with  a  horizontal  tangency 
because  tt  =  0  somewhere)  are  fully  captured  by  the  R&D  case:  a  vanishing  intensity  level. 

§B.l.  We  progressively  restrict  the  control  process  to  simpler  domains.  First,  we 
posit  the  SDE  met  by  the  observation  process  (xj),  given  any  control  functional  n(-)  of 
past  observations.  Next,  we  derive  the  Bayes  filter  SDE  describing  the  evolution  of  (pt). 
Then  we  show  how  to  formulate  the  OCS  problem  with  current  beliefs  pt  as  state  variable: 
Since  the  best  existing  theory  does  not  do  this  for  joint  optimal  control  and  stopping,  we 
first  find  weak  conditions  to  write  the  value  function  asV  =  V{p),  and  then  prove  Lemma  1 
{V  convex  in  p);  this  yields  a  simple  belief  exit  set  [0,p]  U  [p,  1],  and  a  Markov  stopping 
time  T.  Finally,  we  show  that  we  may  restrict  WLOG  to  Markov  control  policies  n{p). 

§B.2.  We  establish  key  properties  of  the  belief  diffusion  to  prove  that  any  candidate 
optimal  policy  of  beliefs  alone  (as  §B.l  allows)  a.s.  stops  experimenting  in  finite  time. 

§B.3.  Assuming  c(0)  >  0  or  tt  >  0  everywhere  (or  both),  we  verify  that  the  candidate 
optimal  Markov  control  n{p)  —  f{rv{p))  from  SU  is  admissible  in  the  sense  of  §B.l, 
justifying  uniqueness  presumed  so  far.  We  then  check  that  the  unique  solution  to  SU  (and 
thus  to  7iJB+ST)  also  solves  OCS  {V  —  v  works,  with  p,p),  and  also  is  its  only  solution. 

§B.4.    We  specialize  §B.3  to  the  R&D  model  with  c(0)  =  0  and  tt  =  0  somewhere. 

B.l  Developing  the  Markov  Control  Model 

a.  Drift  and  Noise  Notation.  Let  {f2,  J,  ?}  be  the  underlying  probability  space. 
Let  ht{uj)  or  h{t,u)  denote  the  same  continuous  time  stochastic  process  h  =  {ht)t>o,  namely 
a  collection  of  J-measurable  real-valued  functions  (random  variables)  /i  :  R_|_  x  Q  ->  R 
We  consider  only  processes  with  continuous  sample  paths,  i.e.  where  h{-,u)  is  continuous 
for  every  cu  G  Q.  We  denote  by  GJq,^)  the  space  of  continuous  functions  on  any  time 
subset  [0,t]  C  [0,oo);  by  h^{u)  =  {{s,h{s,u)))s<t  the  (graph  of  the)  process  restricted  to 
[0,  i],  whose  (superscripted)  elements  are  sample  paths  or  trajectories  in  Cmoo);  by  /i*~  the 
restriction  of  h^  to  [0,t);  (3^f)  is  the  filtration  generated  by  h.  Finally,  W  =  {Wt,J^)  is 
the  standard  Wiener  process  or  standard  Brownian  Motion  (BM)  with  zero  drift  and  unit 
variance,  adapted  only  to  (3"^^),  and  (canonically)  with  continuous  sample  paths. 

Let  po  ^  (0, 1).  Think  of  Nature  as  initially  independently  drawing  a  drift  m  G  {— yu,  /i}, 
with  chances  po,  1  — Po,  and  a  BM  noise  path  realization  VF°°  €  C[o_oo)-  To  render  our  prob- 
lem amenable  to  existing  results  in  nonlinear  filtering  theory,  we  must  define  an  underlying 
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unobserved  process  that  the  VA4  wishes  to  infer  (not  simply  the  stationary  drift),  and  an 
observable  process  that  provides  noisy  information  on  the  former  (the  controlled  signal). 
This  construction  works  even  though  the  unobservable  process  is  time-invariant  —  e.g. 
Oksendal  (1995)  (hereafter,  095),  Example  6.11. 

Put  Q  =  (0,2],  and  endow  it  with  the  Borel  a-algebra  3"  =  S(o,2],  and  partition 
Q  =  nLUQH  =  (0,1]  U  (1,2].  Let  U^  :  M+  X  (0,1]  -^  R  be  a  standard  BM  on 
{(0, 1],  !B(o,i],  yw},  where  7w  and  W  exist  by  the  Kolmogorov  extension  theorem.  Then 
W  extends  to  a  standard  BM  on  {{l,2],'B(^i^2],'^w}  by  translation.  Next,  define  prob- 
ability measures  Q{uj)  =  polivenn  +  (1  ~  Po)^ajenL  ^^^  7{ui)  =  Q{uj)Tw{^)  on  {^7,  3"}. 
Finally  choose  fj,  >  0  and  define  the  time-invariant  process  m  :  R+  x  fi  — ^  {—//,//}  such 
that  mt{uj)  =  miuj)  =  //  for  all  u  G  Qh  and  m{u!)  =  —jj,  otherwise.  Note  the  equivalence: 
payoff-relevant  state  9  G  {H,  L}  ■<->  drift  m  e  {//,  —n}  -H-  partition  element  Qg  G  {0^,  ^l}- 

The  non- anticipatory  control  functional  of  time  and  sample  paths  n:E+  x  Cj*q     s  -^  K++ 

satisfies  (i)  n{t,  /i°°)  =  n{t,  /i*)  for  all  t,^^  and  (m)  n(t,  h^{u>))  is  also  3'f-adapted  for  each 

ff'i-adapted  process  h.  Fix  a  >  0.  The  controlled  signal  process  {xt{uj))  is  a  diffusion  with 

Xq{lo)  =  0,  solving:  cr 

dxt{uj)  =  m{u)dt  +  --dWtjuj).  (12) 

^n{t,x^{Lu)) 

h.  The  Continuous-time  Non-Linear  Bayes  Rule  Filter.  Given  state-contingent 
payoffs  and  vNM  preferences,  the  VAd's  expected  payoff  from  stopping  depends  on  the 
controlled  signal  diffusion  only  via  the  posterior  probability  of  m  =  /x  (state  H),  given 
time  and  past  observations  of  the  signal  and  own  control  pt  =  7{m  =  fx\  po,t,x^,  n*"). 

Our  model,  with  the  constant  state  process  m  and  signal  process  obeying  (12),  meets 
all  conditions  of  Theorem  9.1  in  LS77  provided  that  there  exists  a  unique  strong  solution 
to  the  SDE  (given  a  prespecified  Wiener  process  (Wt)),  established  in  §B.3.  Their  result, 
a  special  case  of  the  so-called  'fundamental  theorem  of  non-linear  filtering',  asserts  that: 
(i)  the  process  pt  =  7{Q,h  \  3^?,  n^')  obeys  pt{(jo)  =  po-,  and  solves  the  SDE:^^ 

dptioj)    =    p,{uj){l  -pt{uj)f~  ^~^^  x/n(t,  x^{u))dWt{u)  ^^3^ 

Wt{uj)    =    (l/a)  jl  ^n{s,x^{uj)){dx'{uj)  -  \ps{uj)iJi  +  (1  -  Ps{uj){-fi))]ds} 

and  that  (n)  W  =  (^f,'3^t)  i^  ^  Wiener  process,  but  not  a  standard  BM.  That  is,  it  is 
adapted  not  only  to  its  own  filtration,  but  also  to  the  smaller  one  {3^^)- 

c.  Beliefs  as  a  Sufficient  Statistic. 


^■^We  shall  choose  h  =  x,  and  soon  after  h  —  p.  Of  course,  at  time  t,  we  need  the  joint  history  of  i'  and 
n*""  to  compute  the  current  optimal  control  nt-  However,  given  the  functional  n  and  x^,  the  control  history 
n*~  is  imphcitly  recursively  embedded;  therefore,  we  shall  omit  it  altogether  for  notational  simplicity. 

^^More  recently,  Bolton  and  Harris  (1993)  provide  alternative  insights  into  this  formula.  j, 

23 


•  The  Control  as  a  Function  Just  of  Belief  History.  Since  we  want  to 
work  with  beliefs  rather  than  signals  as  a  state  variable,  we  establish: 

Claim  10  There  is  a  bijection  between  information  sets  (p*,n*~)  and  {x\n*~).  Hence, 
p*  IS  sufficient  for  x'' ,  and  we  may  WLOG  replace  n[t,x^{uj))  by  n{t,p^(uj)). 

Proof:  We  first  prove  that  (x*,  n*")  i->  (p*,  n'~)  is  an  injection.  Fix  the  control  functional  n, 
and  states  a;,a;'  e  Q  whose  resulting  signal  paths  differ:  Xs{u)^Xs{uj')  for  some  se[0,i]. 
By  continuity,  this  holds  on  a  time  set  of  positive  measure.  Then  we  either  have  [i) 
n(t,a;*(a;))  =  n*~(a;)  =  n*-~{u)')  =  n(f,s'(a;')),  and  given  the  filter  (13),  belief  paths  differ, 
p\uj)i^p\uj'),  or  [ii)  n'-{Lo)  +  n'-{uj').  Either  way,  {p\uj) ,  n'- (u)}  +  {p*(a;')>  "*"('^')}- 

That  {p*,  n'~}  i— >  {x*,  n*~}  is  an  injection  follows  from  strong  existence  and  uniqueness 
of  the  diffusion  Pt(a;),  which  imply  that  every  signal  process  path  x^iuj)  and  associated 
control  process  n(t,  x'(w))  yield  a  unique  belief  process  Pt(a;)  via  the  filter.  □ 

•  The  Value  as  a  Function  Just  of  Current  Beliefs.  Given  the  absence 
of  suitable  sufficiency  theorems  for  optimal  control  joint  with  optimal  stopping  —  Krylov 
(1980)  being  the  best  source  here  —  we  proceed  indirectly,  via  the  value  function. 

Claim  11     The  supremum  (4)  can  be  written  as  V{p),  a  function  of  the  current  belief  p. 

Proof:  The  random  variable  T  :  ^2  i— )■  [0,  oo]  is  a  Markov  time  relative  to  (3^^)  if  {a;  : 
T{uj)  <  t}  e  "3^  for  all  t  >  0.  The  VM.  must  optimally  stop  the  one-dimensional 
controlled  process  (pt(a;))  solving  (13)  for  an  adapted  control  process  niiui)  and  a  Markov 
time  T(a;),  w.r.t.  the  measure  ^(a;)  on  fi.  The  general  results  in  Krylov  (1980)  assume 
a  continuous  terminal  reward  function  tt  (like  us;  his  more  powerful  verification  theorems 
require  tt  e  6^).  His  Theorem  3.1.9  provides  a  recursive  equation  for  l^(t,pt|T),  namely  the 
supremum  value  at  time  t  of  the  length-T  horizon  problem.  His  Theorem  3.1.10  states  that 
this  upper  bound  is  achieved  via  a  feedback  control  functional  n(i,p*).  Next,  by  Theorem 
6.4.4,  since  limj.^oo  £'t^[e~'"(^~'W(pj._J]  =  0  for  alH  >  0  and  adapted  control  processes  rif, 
the  value  of  the  infinite  horizon  problem  is  Vit.p^  =  limf^^V{t,pt\T).  Finally,  by  his 
Remark  6.4.13,  V{t,pt)  =  V{pt)  for  all  t,  since  our  E(-),  c(-),  7r(-),  and  r  are  stationary.  □ 

•  Belief  Thresholds  and  Value  Function  Convexity:  Proof  of  Lemma  1. 
We  can  now  prove  that  the  optimal  stopping  decision  is  described  by  a  stopping  set  in 
current  belief  space  [0, 1].  By  Claim  11,  the  supremum  value  can  be  written  as  V{p).  By 
using  a  policy  that  is  e-optimal  for  the  V{po)  problem  (i.e.  it  achieves  a  payoff  >  V{po)-e) 
for  any  p  e  (pi,P2),  the  VM  can  ensure  himself  an  expected  payoff  that  is  affine  in  p. 
For  such  a  strategy  yields  a  constant  payoff  in  either  state  9  =  H,L,  and  adjusting  p 
merely  weights  these  payoflfs  —  since  the  Wiener  probability  law  7w  is  independent  of 
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the  Lo  partition  element  {Vie}-,  while  m  =  ±/i  in  states  H,L.  Thus,  V  everywhere  weakly 
exceeds  some  supporting  e-tangent  line  at  p.  As  e  >  0  and  p  are  arbitrary,  V  is  convex. 

Next,  define  stopping  sets  Sa-,  Sb,  and  Sq  for  A,  B,  and  the  null  action.  Since  point 
beliefs  are  stationary  for  (13),  ^(0)  =  7r(0)  and  y(l)  =  7r(l).  As  stopping  is  always  an 
option,  we  have  V  >  it;  equality  V{p)  =  7r(p)  holds  iff  p  G  5^  U  5//  U  Sq.  If  ever  V{p)  =  n 
so  that  p  €  Sa,  then  convexity,  V  >  tt,  and  0  G  Sb,  together  force  ^  =  tt  on  [0,p].  Thus, 
Sa  =  [0,p],  and  similarly  Sb  =  [p,  1]  for  some  0  <  p  <  p  <  1.  Similarly,  since  tt^  >  0  and 
7C§  >  0,  the  null  action  is  never  exercised  at  p  =  0  and  p=l,  so  that  5'o  =  [p„,Po].  □ 

•  The  Control  as  a  Function  Just  of  Current  Beliefs.  We  have  now  proven 
that  the  stopping  rule  is  characterized  by  an  exit  set  in  the  space  [0, 1]  of  current  beliefs.  We 
may  therefore  simplify  the  supremum  operator  in  (4)  as  sup^-,,  =  suprp^i[sup„].  For  each 
{p,p},  resolving  the  inner  supremum  is  crucially  a  pure  control  exercise.  Under  very  weak 
conditions  guaranteed  by  our  simple  two-point  boundaries  and  boundedly  finite  maximand, 
for  any  optimal  ?f -adapted  control,  there  exists  a  Markov  control  that  performs  as  well, 
by  Theorem  11.3  in  095.  Hence,  only  the  current  belief  pt  matters  for  control.  Hereafter, 
we  restrict  to  stationary  Markov  controls,  of  the  type  n(t,p'(a;))  =  n{pt{u>)),  where  n  : 
[0, 1]  — )■  ]R_|.  is  a  B[o,i]-measurable  function.  Thus,  we  consider  beliefs  as  the  solution  to 
the  SDE  dpt  =  ^/2E{pt}n{pt)dWt  (given  po),  on  any  continuation  set  0  ^  {p,p)  C  [0, 1]. 

Definition  An  admissible  stationary  Markov  control  policy  n  G  M  zs  a  strictly  positive 
'B[o^i]-Tneasurable  function  n  :  [0, 1]  — >■  ]R-|_+  yielding  a  unique  strong  solution  to  (12)-(13). 

d.   The  Properly  Formulated  Optimization  Problem.        Replacing  (4)  given  (2), 
the  OCS  problem  when  WLOG  restricting  to  stationary  Markov  controls  is  therefore  now: 

^T{u\p,p) 
n(-)eM;p,p€[0,l]  ■ 

s.t.         T  (a;  I  p,p)  =  inf  {t  >  0  :  pt{uj)  <p  or  pt  (a;)  >  p} 


y(po)=      sup  /     /    ""'-c(n(pt(a;)))e-^*dt  +  e-^^('^lH'P")7r(pr(,|p,p-)) 

n(-)eM;p,pe[o,i]  Jn    Jo  ^  ^ 


dy{uj)     (14) 


Pt{uj)  =  Po  +  £  ^2n{p,{u))T,{ps{u))dWt{uj)  (15) 

Wt{uj)  =  (1/a)  /o  ^n{ps{uj)){dxsiu)  -  [p,(6^)/i  +  (1  -  Ps{cj)i-fi))]ds} 

e.  Brief  Aside:  Making  Sense  of  the  Bayesian  Model  of  §4.2^.  In  the  text,  we 
formulated  this  result  without  measure  theory  for  illustrative  purposes.  Yet  our  summary 
was  not  without  basis.  We  defined  a  conditional  signal  process  (xf),  which  we  can  now  more 
generally  write  dxf{uj)  =  Itj^ng[fj,^{u)dt  +  {a/nt)dWt{to)],  for  9  —  L,H\  the  belief  driving 
force  dWt,  from  the  unconditional  point  of  view  of  the  VM,  was  really  the  mixture  of  two 
maps,  Wt{ijj)  =  pt{uj)Wl^{iJ)  4-  (1  -pi(a;))W/'(a;),  where  (3)  is  equivalent  to  this  definition 
if  premultiplied  by  the  indicator  function  \^J  of  0,0  (suppressed  in  the  text,  along  with  a;)." 
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B.2  Experimentation  Ends  Almost  Surely  in  Finite  Time 

We  now  assume  that  a  stationary  optimal  policy  {n{-),p,p)  exists,  with  n(-)  admissible. 
We  then  show  that  experimentation  a.s.  ends  in  finite  time,  as  Theorem  5  will  need.  Note 
that  positivity,  demanded  by  admissibility,  is  not  a  restriction  on  optimality.  For  if  n(p')  =0 
at  p'e  {p,p)  C  [0, 1],  then  dp\p'  =  n{p')'£{p')dW  =  0  since  E  <  oo,  contrary  to  p'  G  £. 

a.  Background  Terminology,  Notation,  and  Theory.  Here,  we  hew  closely  to 
KT81,  §15.6-7,  and  Karatzas  and  Shreve  (1991)  (KS91),  §5.5. C  (adapting  both  for  clarity, 
and  to  avoid  overuse  of  letters).  Below,  po,p,y,  and  z  are  arbitrary  points  in  (p,p).  The 
hitting  time  of  2  G  \p,p]  frompo  is  the  random  variable  Tp^z  =  inf{i  >  0:pt  —  z  \  po}.  Next, 
the  diffusion  (pt)  is  regular  if,  for  any  pair  of  interior  points  y,z  ^  {p,P)i  y  can  be  reached 
in  finite  time  with  positive  probability  (hereafter,  FTPP)  from  z,  or  PT{Tzy  <  oo)  >  0. 
This  is  a  critical  notion,  analogous  to  the  communicating  property  for  Markov  processes. 

We  now  define  several  important  integrals  of  l/n{p),  needed  both  here  and  in  §B.3,B.4. 
Though  they  may  be  unbounded,  they  are  well-defined,  because  the  control  n(-)  >  0  is 
!B[o,i]-measurable.  Given  a  general  belief  diffusion  process  dp  =  (3{p)dt+a{p)dW  with  range 
{P,P)  ^  (0, 1),  the  scale  function  of  KT81,  KS91  is  Siy)  =  J^^  exp{-2  J^^  p{z)dz/a'^{z))dx. 
By  KTSl's  two  convenient  abuses  of  notation,  S[y,p]  =  S{p)  —  S{y)  denotes  the  scale 
measure,  and  S{p,p\  =  \iinyipS[y,p]  its  left-side  limit.  The  boundary  p  is  attracting 
if  S{p,p]  <  oo,  independently  of  p  G  (p,p).  A  similar  definition  applies  at  p.  By  an 
equivalent  formulation  in  KTSl's  Lemma  6.1,  an  attracting  boundary  is  strictly  speaking 
non-repelling:  (pf)  approaches  the  boundary  (say  p)  arbitrarily  close  with  positive  chance 
starting  from  any  interior  point  po,  before  any  larger  interior  b  >  po  is  hit. 

As  an  attracting  boundary  might  not  be  reached  in  FTPP,  we  need  a  harsher  concept: 
Similar  to  the  scale  function,  define  the  speed  function  M{y)  =  j^  [S'Q{z)a'^{z)]''^dz,  the 
speed  measure  M[y,p],  and  M(p,p]  =  liniy4.pM[y,p].  Next,  let  J\y,p\  =  j^  S[y,z\M'{z)dz 
and  K[y,p\  =  j^ S[z,p\M'{z)dz,  with  right  limits  J{p,p]  and  K{p,p].  The  boundary  p  is 
attainable  if  J{p-,p]  <  oo  for  any  p  G  {p,p).  Intuitively,  by  Lemma  6.2  in  KT81  this  means 
Pr(Tpp  <  oo)  >  0.  An  attainable  boundary  is  thus  not  just  approached  but  hit  in  FTPP. 

b.  Attracting  Boundaries.  We  explore  the  unconditional  driftless  (/?(p)  =  0)  belief 
process,  with  diffusion  term  a{p)  =  ^y2n{p)'E{p)  in  (p,p).  Then  S[y,p]=p-y  as  n>0. 

Claim  12  Assume  n  >  0  in  [0, 1],  or  c(0)  >  0.  For  an  optimal  control  n{p),  the  belief 
process  {pi=pQ+J^  >/2n(ps)S(ps)dH^s)  is  regular,  and  any  triggers  0<p<p  <  I  attracting. 
Proof:  That  boundaries  are  attracting  follows  from  S{p,p\  <  oo  and  S[p,p)  <  oo,  both 
essentially  stated  above.  Regularity  is  proven  by  contradiction.  Suppose  that  there  exists 
a  pair  y,z  e  (p,p)  with  PiiTy^  <  oo)  =  0.  Obviously  y  j^  z;  WLOG  order  p<y<z<p.  ^ 
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Proof  Step  1:  Regularity  on  Some  Interval.  Given  sample  path  continuity 
and  y  <  z  <  p,  dll  diffusion  patlis  from  y  to  p  first  fiit  z,  and  so  Pr(Tj^2  <  Typ)  =  1. 
Hence,  PviTyp  <  oo)  <  Pv{Ty,  <  oo)  =  1  -  1  =  0.  But  then  Pr{Typ  <  oo)  >  0,  for 
otherwise  starting  at  belief  y,  the  T>M  strictly  suboptimaliy  experiments  at  cost  without 
(a.s.)  any  prospect  of  positive  discounted  returns.  Once  more  sample  path  continuity 
yields  Pr{Tyny,  <  oo)  >  Pr (Ty"p  <  oo)  >  0  for  any  y'  <  y"  with  [y',  y"]  C\p,y].  To  wit,  the 
diffusion  process  (pt)  traverses  down  across  any  such  [y',  y"]  to  y'  in  FTPP.  For  regularity 
on  [y',  y"],  (pt)  also  must  transit  up  in  FTPP.  Pick  any  y'  =  y  in  (p,  y),  and  assume  to  the 
contrary  that  no  y"  >  y'm  {p,  y)  is  hit  in  FTPP  from  y.  Then  the  belief  process  drifts 
down  but  not  up  in  FTPP,  which  violates  the  martingale  property.^'* 

Proof  Step  2:  The  Maximal  Regularity  Interval.  By  Step  1,  (pt)  is  a 
regular  process  on  a  non-empty  interval  [y,y"]  C  [p,  y).  We  next  claim  that  the  largest 
connected  regularity  set  with  lower  bound  y  is  right-open,  of  the  form  [y,y)-  To  see  why, 
we  prove  that  if  {pt)  is  regular  on  [y,y"],  then  it  is  also  regular  on  [y",y"  +  77]  for  small 
77  >  0.  Assume  not.  Specifically,  suppose  PT{Tyi>y"+r)  <  00)  =  0  for  all  r]  >  0.  Because 
Pr(Ty2  <  00)  =  0  with  y  <  y,  and  Px{Tyy"  <  00)  >  0,  we  have  y"  <  z  <  p,  and  so  y"  -\-t]  <  p 
for  small  enough  77  >  0.  As  in  Step  1,  this  contradicts  the  martingale  property  of  {pt). 

Proof  Step  3.  Contradiction  to  Optimality.  Since  y  is  unattainable  in 
FTPP  starting  from  any  p  E  [y,  y),  so  is  p  >  y.  Then  the  PAl's  only  reason  to  experiment 
is  reaching  the  lower  boundary  p  in  FTPP.  We  now  argue  that  this  yields  a  negative  payoff. 

Since  S{y,  y]  =  y  —  y  <  00,  the  boundary  y  is  attracting.  Steps  1-2  proved  that  y 
is  an  unattainable  boundary  of  a  regular  process  on  [y^y);  thus,  J[y,y]  =  00  by  KTSl's 
Lemma  6.2.  Because  also  J[y-,y]  +  K{y,y]  —  S{y,y]M{y,y]  by  KTSl's  Lemma  6.3-t;, 
0  <  S{y,  y]  <  00  forces  M{y,  y]  =  00.  By  KTSl's  Lemma  6.3-uz,  this  implies  K{y,  y]  =  00. 
Thus,  y  is  a  natural  (Feller)  boundary  and  so  their  Theorem  7.2-n  conveniently  says 
limp-t-y  lim^^y  £'[e~'"^«p]  =  0.  Both  limits  exist  as  the  expectation  is  monotone  in  p  and  q,  and 
lies  in  [0, 1].  Respecting  the  order  of  limits,  this  means:  For  each  e  >  0,  there  exist  Pe  and 
q^,  with  y_<pe  <  qe  <y,  such  that  £'[e~'"^«^P£  ]  <  e.  Since  p  <  p^^  <  q^,  the  expected  gross 
discounted  returns  starting  from  q^  are  Riqe)  —  £'[e~'"^'^E]7r(p)  <  E[e~^'^''^''^]-K{p)  <  £7r(p). 

Since  the  VM.  stops  only  at  the  lower  threshold  p,  the  expected  total  discounted 
experimentation  costs  K{qs)  are  positive  for  all  qg  >  p,  increasing  as  g^  rises  to  y  as  e  — >•  0, 


^''Here's  a  proof.  First,  for  any  i>0,  ifr>Oisa  stopping  time,  then  so  is  t  A  T  =  min{t,T). 
Since  PT{Typ  <  oo)  >  0,  we  have  t  A  Typ  =  Typ  with  positive  chance  for  large  enough  t  <  oo.  Unless 
t  A  Typ  A  Tyyii  =  Tyy"  wlth  posltlve  probability  for  some  y"  >  y  (and  so  y"  is  hit  in  FTPP  from  y), 
PT{t  ATyp  ATyyn  =  Typ)  =  Pi{t  ATyp  =  Typ)>0  ioT  sll  J/"  >  ^.  Thcn  £bt  AT^AT^,,,  \y]<y  +  Pr(i  A  Tj^  = 
Tj,p)[p  — y]  <  y.  But  Wald's  Optional  Stopping  Theorem  guarantees  equality  because  (pt)  is  a  martingale. 
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and  hence  boundedly  positive.  Thus,  the  value  v{qe)  —  R{Qe)  —  i^iQe)  <  •^7r(p)  —  i^iQe)  < 
TfiQe)  for  small  enough  e  >  0,  because  either  c(0)  >  0  and  thus  /^(ge)  >  0,  or  n{q^)  >  0,  or 
both.  This  contradicts  the  assumption  g^  €  {y,y)  C  {p,p)-  □ 

c.  Finite  Hitting  Times.  In  summary,  regularity  only  fails  if  the  intensity  level 
"nearly  vanishes"  around  some  interior  belief  y  e  (p,p).  This  implies  that  the  belief  process 
could  approach  y,  WLOG  from  below,  only  at  a  vanishing  speed;  and  by  the  martingale 
property,  the  speed  also  vanishes  moving  downward,  towards  the  only  attainable  threshold 
p.  But  then  all  discounted  payoffs  starting  at  belief  po  G  (P)  V)  come  from  hitting  p,  which 
vanish  as  po  nears  y.  The  VA4  then  wishes  to  stop  immediately,  contradicting  po  G  (PiP)- 

Theorem  2  Assume  it  >0  m[0,l],  orc{0)>0,  or  both.  If  a  solution  {n{-),p,p]  to  OCS 
exists,  then  the  expected  time  to  hit  p  or  p  is  finite.   The  hitting  time  is  thus  a.s.  finite. 

Proof  Step  1:  Both  boundaries  p,p  are  attainable  from  any  interior  belief:  Indeed, 
optimality  at  po  G  {PtP)  precludes  Pr(TpQp<oo)  =  Pr(TpQp<oo)  =  0.  If  either  is  zero,  say  p 
is  unattainable  in  FTPP,  then  by  the  logic  of  Step  3  above,  u(po)  <  7r(po)  for  po  close  to  p. 

Proof  Step  2:  With  attainable  boundaries,  J{p,p\,J\p,p)  <  cx)  by  KTSl's  Lemma 
6.2.  But  by  Proposition  5.5.32-z  in  KS91,^^  these  two  inequalities  are  iff  for  our  result 
assuming  {i)  nondegeneracy  {ND'),  or  a  positive  diffusion  term  nE>0  in  {p,p),  and  (n) 
local  integrability  {LI'),  or  Vp  G  {p,p)  3e>0  with  M[po  —  s,po  +  e]  =  J^_^  dy/[n{y)T,{y)]  < 
oo.  Clearly,  {ND')  obtains  because  n(p)>0  by  optimality,  and  E(p)  >  0  inside  (0, 1). 

For  a  contradiction,  assume  {LI')  fails  for  some  po  €  {p,p)-  Then  M[po— £,po  +  e]  =  oo 
for  all  e  >  0  —  say,  small  enough  that  p  <  po  —  e  <  po  +  e  <  p.  Next,  all  points  in 
[po  —  £,  Po+e]  are  attainable  in  FTPP  from  po,  as  the  diffusion  is  regular  in  its  superinterval 
{■p,p).  Thus,  J\po  —  e,po  +  e]  <  oo  by  attainability,  while  M\po—e,po  +  s]  =  oo.  By  KTSl's 
classification  (end  of  §15.6-6),  both  po±e  are  then  exit  boundaries  of  [po  —  e,po  +  e].  The 
contradiction  proof  from  Claim  12  (step  3)  with  KTSl's  Theorem  7.2-ii  still  applies.     □ 

B.3  Verification  Theorem:  Existence  and  Uniqueness  of  a  Solution. 

a.  Admissibility  Conditions.  We  check  the  conditions  (see  the  definition  in  §B.l-c) 
for  our  candidate  Markov  control  n{p)  =  f{rv{p))  to  be  admissible.  Until  §B.4,  we  assume 
the  non-R&D  and  non-  null  action  case  with  tt  ^  0,  to  avoid  some  technical  issues. 

•  PosiTiviTY.  This  has  already  been  addressed  at  the  outset  of  §A.2. 

•  Measurability.  The  composition  of  a  Borel  measurable  map  with  any  measurable 
map  Xt  preserves  Jf  measurability  (095,  Lemma  2.1);  also,  being  constant  in  time,  n(-) 

^^In  our  notation,  their  result  says:  The  stopping  time  T  is  finite  a.s.  iff  (i)  both  boundaries  are  attainable 
[J{p,p]  <  oo  and  J\p,p)  <  oo  (same  as  their  v  function)];  in  this  case,  more  strongly  E{T)  <  oo.  w 
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is  measurable  in  the  time  Borel  sets.  So  for  any  !B[oj]-measurable  function  n(-),  the  control 
process  n{pt{uj))  is  'B^_^_  ®  J'-measurable,  and  Jj^-  and  Jf-adapted,  like  pt{to).  Since  the 
Markov  control  function  f{rv{p))  is  continuous,  it  is  therefore  S[o,i]-measurable. 

•  Strong  Existence  and  Uniqueness  (SEU).  Given  our  candidate  optimal  policy 
{f{rv,p,p},  the  belief  process  has  a  unique  strong  solution.  Indeed,  replace  the  functional 
n(t, ^')  with  our  candidate  Markov  control  f{rv{pt))  in  (12),  and  substitute  into  (15)  for 
Wt{Lo),  and  then  pt{oj).  Suppressing  w, 

dpt  =  pt{l  -  Pt){2lJ./a)  {[f{rv{pt))/a]  [m  -  /i(2p,  -  l)]dt  +  Vf{rv{pt))dWt)     (16a) 
Pt{l-Pt?ef{rv{pt))dt  +  pt{l-ptKVfiryiPt))dWt 


-p't{^-Pt)efirv{pt))dt  +  pt{l-pt)C^firv{pt))dWt 


(16b) 


where  (^  =  2fj,/a.  As  the  probability  space  and  Wiener  process  W  are  primitives,  we  need 
only  verify  that  there  exists  a  unique  strong  (KS91,  §5.2.1)  solution  to  each  corresponding 
SDE  dpt  =  (5e{Pi  m)dt  +  a{p)dW  on  Q^,  9  =  L,H,  where  a  and  j3e  are  implicitly  defined 
by  (16b).  If  so,  the  process  {pt{uj))  solving  (16a)  is  a  diffusion,  being  a  mixture  (weights 
Poi  1  -Po)  of  the  conditional  diffusions  (16b),  and  so  inherits  the  defining  properties  (KS91, 
§5.1.1).  An  adapted  control  process  {f{rv{pt{uj))))  is  induced,  and  a  uniquely  defined 
observation  process  Xt,  after  substitution  into  (12). 

Because  our  belief  SDEs  are  defined  on  a  bounded  open  interval  (p,p),  all  requirements 
below  need  only  hold  up  to  any  hitting  time  T,  at  which  point  the  process  is  absorbed. 
Existence  of  a  weak  solution  on  (p,p)  —  i.e.  up  to  any  hitting  time  —  ior  6  =  L^  H  follows 
by  Skorohod's  Theorem  (KS91,  Theorem  5.4.22)  from  po  ^  (0, 1),  and  the  continuity  and 
boundedness  of  a  and  Pe  on  {p,p).  Just  as  with  ODEs,  an  SDE  is  uniquely  soluble  given  a 
Lipschitz  condition,  here  joint  on  the  drift  and  variance  terms,  (3g  and  a.  This  requirement 
is  met  because  /  and  y/J  Lipschitz  (by  Claim  3)  and  v  differentiable  imply  that  f{rv{p)) 
and  y/f{rv{p))  are  Lipschitz  in  p  G  (p,p).  (A  proof  mimicks  that  of  the  composition  chain 
rule  in  calculus.)  By  Proposition  5.2.13  in  KS91  (with  h{x)  =  a;  in  their  equation  (2.25)), 
this  implies  strong  uniqueness  and  then  pathwise  uniqueness  (by  KS91,  Remark  5.3.3). 
Skipping  around  their  book,  along  with  weak  existence,  this  yields  SEU  (KS91,  p. 310). 

b.  Value  Function  ChEiracterization  and  Verification  of  the  OCS  Policy. 

Theorem  3  The  supremum  value  V  of  OCS  (14) -(15)  equals  v,  and  {p  :  V{p)  >  7r(p)}  = 
{p,p),  where  {v,p,p}  solves  £U  (5) -(7). 

Proof  Step  1:  Control.  The  supremum  value  V{p)  of  OCS  clearly  exists  —  and  by 
Lemma  1,  it  uniquely  defines  a  continuation  region  £  =  {p*,p*)  Q  (0, 1),  by  [0,p*]U[p*,  1]  = 
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{p  e  [0,1]  :  V{p)  =  7r(p)}.  For  any  admissible  Markov  control  n{p)  and  resulting  belief 
process  {pt{cu)),  define  V{po)  as  in  (14),  except  using  stopping  time  T{u)\p*,p*).  Since  tt  is 
continuous,  this  is  a  pure  control  problem  with  given  boundaries  p*,p*. 

For  a  sufficiency  verification  theorem,  we  check  three  conditions.  First,  there  exists 
a  solution  v{p)  of  the  1-LJB  problem  with  value  matching  only  at  p*,p*.  By  Claim  9,  a 
solution  v{p)  to  the  ODE  v"{p)  =  d {f  {rv{p))) /T,{p)  and  value  matching  (6)  exists  for  any 
thresholds  in  [0, 1],  such  as  p*,p*,  on  the  boundary  of  the  set  V{p)  =  7r(p).  Second,  v  e  G^ 
in  \p,p]  because  S  is  continuous  in  [p,p],  and  ^  =  c'(/)  is  continuous  on  [0,  oo)  by  Claim  2, 
so  v"{p)  is  continuous.  Third,  the  family  of  functions  {v{pt)}t<T  is  uniformly  integrable 
for  all  Markov  controls  n(-),  any  stopping  time  realization  T,  and  any  process  (pt)  starting 
in  (p,  p),  simply  because  |f(pi)|  is  boundedly  finite  (see  095,  Appendix  C.3).  Therefore, 
by  Theorem  11.2  in  095:  V{p)  —  v{p)  €  C^  in  [p,p],  and  there  exists  an  optimal  (WLOG 
Markov,  as  proved  in  §B.l.c)  control  n{p)  =  f{rv{p))  =  f{rV{p))  for  OCS. 

Proof  Step  2:  Stopping.  Next,  replace  n{p)  in  (13)  with  our  candidate  f{rV{p)). 
Given  this  controlled  diffusion  Pt(a;),  consider  the  resulting  pure  optimal  stopping  problem: 

V{po)  =  sup^>o  E^  [j^  -c(/(rT/(p)))e-"d5  +  e-^^n{pT{uj)) 

withpo(<^)  ==  Po  —  where  the  value  of  this  stopping  problem  is  our  supremum  F(p),  because 
f{rV{p))  is  an  optimal  control  for  OCS,  from  Step  1.  By  Theorem  3.15  in  Shiryayev  (1978) 
(S78),  extended  to  a  continuous  flow  cost  c{f{rV{-)))  and  final  payoff  7r(-),  and  geometric 
discounting  by  his  Remark  3.8.3,  for  all  p  E  8,,  V  solves  the  generalized  Stefan  problem 
rV{p)  =  -c{f{rV{p)))  +  /(rF(p))E(p)y"(p)  s.t.  value  matching  at  dE.  Hence,  V"  >  0  in 
£,  and  V  =  n  in  [0,1]\8.  (where  V"  =  0).  Therefore  V  is  continuous  and  convex  in  [0, 1], 
and  by  Theorem  24.1  in  R70  it  has  right  and  left  derivatives  in  [0,1],  and  so  at  p*,p*. 
Also,  as  shown  in  §B.2.a,  both  boundaries  are  attracting  for  any  positive  control.  The 
necessity  of  smooth  pasting  then  follows  from  Theorem  3.16  in  S78,  i.e.  V  solves  ST,  i.e. 
rv{p)  =  —c{n{p))  +  n(p)E(p)f"(p),  plus  (6)-(7),  for  the  given  control  n{p)  =  f{rV{p)). 

Summing  up,  the  triple  {V,p*,p*}  solves  TiJB  (Step  1)  and  ST  (Step  2).  As  sketched 
after  Proposition  1,  the  Bellman  equation  (5)  has  a  unique  maximizer  f{rv),  and  so  is 
equivalent  to  v"  =  d{f{rv{p)))/T.{p).  Thus,  TiJB+ST  are  jointly  equivalent  to  £U,  and 
a  unique  solution  {v{p),p,p}  exists;  therefore  v{p)  =  V{p),  p*  =  p,  p*  —  P,  as  asserted.  □ 

Theorem  4  Assume  n  >  0  in  [0, 1],  or  c(0)  >  0,  or  both.  The  Markov  control  policy 
n{p)  =  f{rv{p))  and  thresholds  p,p  from  EU  are  optimal  for  the  OCS  problem. 

Proof:  Since  v{p)  =  V{p)  and  f{rV{p))  is  an  optimal  control  (Step  1  of  Theorem  3),  the 
£U  control  f{rv{p))  is  not  only  admissible  (§B.3.a)  but  also  optimal  for  OCS.  By  Theorem 
3.3  in  S78,  the  stopping  time  T  =  Tp^pATp^p  —  and  then  the  whole  £U  policy  —  is  optimal^ 
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for  OCS  if  T  <  oo  a.s,  which  we  now  establish.  First,  f{rv{p))  >  0  for  all  p  G  (p,p) 
implies  that  conditions  {ND')  and  {LI')  in  KS91  5.5  hold  for  the  resulting  controlled 
belief  SDE  (see  the  proof  of  Theorem  2),  and  S{p,p]  <  oo,  S\p,p)  <  oo.  Therefore  by 
Proposition  5.5.32-(i)  in  KS91,  T  <  oo  a.s.  iff  J{p,p]  =  J^{z  -  p)dz/[f{rv{z))i:{z)]  <  oo 
and  J[p,p)  =  J^{p  -  z)dz/[f{rv{z))E{z)]  <  oo.  Since  {p,p)  C  (0,1),  then  E{z)  >  0  and 
these  inequalities  follow  from  /(rw (•))>■  0  in  (p,  p)  —  true  given  c(0)  >  0  or  tt  >  0  always.  D 

c.  Verification:  Uniqueness  of  the  Optimal  Policy  for  OCS. 

Theorem  5  Assume  tt  >  0  in  [0, 1],  or  c(0)  >  0,  or  both.  Any  solution  to  the  OCS 
problem  (14)-(15)  solves  £U  (5)-(7).  Smce  {f{rv{-)).,p,p]  solves  OCS  by  Theorem  4,  the 
unique  solution  of  ElA  is  the  unique  optimal  policy  for  OCS. 

Proof:  Because  7r(p)  and  V{p)  are  uniquely  defined,  so  is  the  boundary  of  the  region 
where  V{p)  =  7r{p),  i.e.  {p*,p*}  =  {p,p}-  Consider  the  uniqueness  of  the  optimal  control 
f{rv{p)).  We  have  proven  above  that  V{p)  =  v{p)  €  C^  in  {p*,p*)  =  (p,p),  and  in 
Theorem  2  that  the  stopping  time  is  finite  a.s.  for  any  Markov  optimal  policy  n{p)  —  not 
just  for  f{rv{p)).  So  by  Theorem  11.1  in  095  any  optimal  n{p)  must  solve  1-LJB  with  value 
V{p),  i.e.  rV{p)  =  -c{n{p))  +  n{p)T,{p)V" (p)  =  max„[-c(n)  +  nT,{p)V"{p)]  for  p  €  {p,p). 
Thus  c'{n{p))  =  E{p)V"{p)  =  c'if{rV{p))).  D 

B.4  The  R&D  Model  /  Null  Action  Case  (tt  ==  0  Somewhere) 

When  7r(p)  =  0  somewhere  and  c(0)  =  0,  then  boundary  problems  arise:   For  n{p)  = 
f{rv{p))  =  0  as  in  Figure  2  (left),  and  p  may  be  unattainable  in  FTPP:  So  Theorem  2  fails. 

Theorem  6  Assume  payoffs  7r(p)  —  0  for  some  p  G  [0, 1].  Then  the  £U  policy  is  uniquely 
optimal  if  either  p  =  limuj^^o  w^~''^^'{w)/^{w)  <oo  for  some  t]  >  0,  or  c(0)  >  0,  or  both. 

Proof:  Clearly,  f{rv{-))  >  0  in  {p,p),  while  f{rv{z))  »  0  near  p  since  7r(p)  >  0  WLOG.  As 
in  the  proof  of  Theorem  4,  the  result  obtains  iff  J{p,  p]  <  oo,  i.e.  J^{z  —p)/f{rv{z))  dz  <  oo. 
If  c(0)  >  0  then  f{rv{-))  >  0  in  {p,p)  and  the  integral  is  finite.  If  c(0)  =  0  and  tt  =  0 
somewhere,  so  that  f{rv{p))  =  0,  we  show  this  is  still  true.  Indeed,  near  p,  we  have 

^-P    ..      1  M^)      _j,(,„(,„^!M£)5(fL  (17) 


f{rv{z))       f{rv{z)){z-p)v"{z)  {z  -  p)r^{rv{z)) 

The  approximate  equality  follows  from  a  Taylor  expansion  v{z)  ~  v"{z){z  —  p)^/2  (valid 
because  v  e  6^)  with  v{p)  =  v'{p)  =  0,  the  second  equality  from  v"{z)  =  ^{rv{z))/'L{z)  and 
from  ^'{w)  =  l//(w),  as  established  in  §A.l.  Notice  that  2E{z)/r  -^  2E(p)/r  =  S  G  (0,  oo) 
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as  r  >  0  and  p  E  (0, 1).  Observe  that: 

rv{z) 


lim     '  ~'-{z-py-^  =  Slim    f  (7(f  )^-(-^)     =  ,siim 
zip  J{rv{z))  ^        -'  -  zip  ^{rv{z)){z  -  pY  ^Iv 


z  —  p 


pElim(™'(z))''  =  0 

zip 


where  the  first  equality  follows  from  (17),  the  second  from  the  assumption,  and  the  third 
from  a  Taylor  expansion  0  =  rv{p)  ~  rv{z)+rv'{z){p  — z)  which  is  valid  because  v  G  C^.  It 
follows  that  the  integrand  {z  —  p)/f{rv{z))  is  bounded  above  by  (z  — p)''~\  whose  integral 
on  \p,p\  is  bounded  for  any  t;  >  0,  which  establishes  the  claim. 

Remark.  If  c(0)  =  c'(0)  =  0,  then  in  the  proof  of  Claim  14,  (,{rv{z))/rv{z)  ~ 
^'{rv{z))  =  l/f{rv{z))  for  rv{z)  small.  Thus  the  integrand  in  the  proof  of  Claim  14 
is  of  order  {z  —  p)~\  whence  J(p,p]  =  oo.  Hence,  Pr{Tp^p  A  Tp^p  <  co)  <  1,  and  no 
optimal  stopping  time  exists.  More  precisely,  the  SU  control  n{p)  =  f{rv{p))  with  the 
new  thresholds  {p  +  e,p}  is  an  e-optimal  policy.  See  Theorem  3.3  in  S78. 

C.   OMITTED  PROOFS:  SENSITIVITY  ANALYSIS 

C.l   Comparative  Statics:  Proof  of  Proposition  6 

We  first  consider  payoff  shifts,  which  just  affect  the  boundary  conditions  of  £U.  We 
then  employ  a  different  methodology  for  shifts  in  r,  C,  and  c(-)  that  skew  the  ODE  itself. 

•  Proof  of  Proposition  6(a):  Increasing  Payoff  Levels.  Write  V{po)  = 
maxx,n^(Po|", T'^tt^iTTb),  where  the  maximand  ^(po|-)  —  seen  on  the  RHS  of  (4)  —  is 
differentiable  in  n^  by  inspection:  V{po\-)^H  =  E[e~^'^p  |  pr  —  p,po,n]Pr{pT  =  P  \  Po,'^)  > 
0.  Here  the  event  pr  =  P  that  the  T>A4  eventually  chooses  action  B  occurs  with  chance 
{Po  —  p)/{p  ~  p)  >  0,  for  any  po  >  P-  The  other  payoff  parametric  shifts  are  similarly 
positive:  V^0{po\-)  >  0  for  all  Po  >  P  for  a  fixed  policy  n,T.  If  the  VM  re-optimizes  after 
a  change  in  one  of  these  parameters,  the  supremum  value  V  cannot  fall. 

Consider  boundary  behavior  at  lower  threshold  belief  p  (p  being  similar),  associating  p. 
and  payoff  parameters  tti  >  ttq.  The  value  function  V^(p|7r)  is  continuously  differentiable  in 
p,  by  smooth  pasting  (7),  and  partially  differentiable  in  n.  Hence,  V{'n\TTo)  —  V{v\'n-i)  ^ 
^7r(PQ|7!"o)(7ro  —  TTj)  is  the  first-order  Taylor  expansion,  as  Vp{n\no)  =  0.  So  n    >  p  .         □ 

•  Proof  of  Proposition  6(6):  Increasing  Riskiness.  Rotate  the  ttb  payoff  line 
counterclockwise  through  (current  belief,  expected  payoff),  as  tt^  falls  and  n^  rises.  Then 
the  value  function  cuts  into  the  new  tt  frontier  on  the  right.  By  a  left-right  reflection  of 
Claim  7-b,  p  must  fall  to  restore  a  smoothly-pasted  tangency  of  Vp{p)  and  7r(p)  on  the  right 
side.  Also,  as  Claim  8  asserts,  as  p  falls,  the  value  Vp{p)  rises,  and  gets  steeper.  □ 
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Claim  13  (A  Key  Implication)     Parametrize  models  by  Ti,  T2,  whereVi  =  {ci{-)Xhfi}- 
Let  {vi,p.,pi}  denote  the  corresponding  solutions,  with  ^i{p,Vi{p))  =  c[{fi{riVi{p)))/T,i{p). 

If 

^2  >  vi  ^*2(p,W2)  >  *i(p,-yi)  (18) 

independently  of  p,  then  p   <  p     pi  >  p2,  and  vi{p)  >  V2{p)  for  all  p  G  [p„,P2]- 

Proof:     Put  Aw  =  W2— 1^1,  and  similarly  Aw',  Av",  all  continuous  maps  {p,,Pi)r\{p  ,p2)  -^R. 

Step  1:  Ordering  Thresholds.  We  assume  p  >  p  ,  and  obtain  a  contradiction. 
By  a  symmetric  argument  pi  >  p2-  First,  we  show  that  p  >  p  implies  Av{p  )  >  0, 
^v'{p)  >  0,  Av"{p)  >  0,  so  that  Av  is  strictly  positive,  increasing  and  convex  at  p  +e, 
for  some  small  e  >  0.  The  first  two  weak  inequalities  follow  at  once  from  value  matching 
and  smooth  pasting  of  each  Vi  at  p.,  along  with  strict  convexity  of  V2- 

Next,  Av"{p^)  =  '^2ip^,V2{p^))-'^i{p^,vi{p^))  >  0  by  (18),  as  Vi  solves  the  Ti  problem, 
and  Aw (p)  >  0.  We  claim  that  Aw  attains  a  global  maximum  at  some  p  €  (p  ,  min(pi,p2))- 
To  see  why,  consider  the  upper  thresholds.  If  pi  <  p2,  then  Ai''(pi)  <  0  by  smooth  pasting 
of  vi  at  pi  and  strict  convexity  of  V2.  Thus,  the  function  Av,  strictly  increasing  at  p  +  £ 
for  some  e  >  0,  is  strictly  decreasing  at  pi  <  P2:  An  interior  global  maximum  p  then 
exists.  If  instead  pi  >  P2,  then  Az;(p2)  <  0  by  value  matching  of  each  Vi  at  p.;  the  function 
Av,  strictly  positive  and  increasing  at  p  +  e  is  nonpositive  at  P2  <  Pi  —  and  so  has  an 
interior  global  maximum  p.  In  either  case,  since  Av{p  +  e)  >  0,  we  deduce  the  ma:!dmum 
Av{p)  >  0.  But  then  Av"{p)  =  ^2(p,  i'2(p})  "  ^i(Pi^i(p))  >  0  because  Av{p}  >  0,  as  just 
deduced.  This  violates  the  second  order  condition  Av"{p)  <  0  for  a  maximum  of  Av  at  p. 

Step  2:  Ordering  Value  Functions.  Value  matching,  plus  p^  <  p^  and  pi  >  p2 
just  proven,  jointly  imply  Av{p  )  <  0  and  Av{p2)  <  0.  The  claim  then  obtains  near  the 
thresholds.  Suppose  that  Av(p)  >  0  at  some  p  €  (P2,P2)-  Then  Av  must  attain  a  global 
non-negative  maximum  for  one  such  p  G  (P2,P2)-  This  requires  Av{p)  >  0  >  Av"(p}  — 
yielding  the  same  contradiction  as  before.  '.  D 

•  Proof  of  Proposition  6  (c):  Cost  Convexity.  Define  Ac(n)  =  C2(n)  -ci(n). 
Convexity  of  Ac  and  Ac(0)  =  0  imply  Ac'(n)  >  Ac(n)/n  for  all  n  >  0.  Equivalently,  we 
have  g2{n)  >  gi{n),  and  thus  f2{uj)  <  fi{w)  for  all  ty  >  0,  so  that  ^2('"^)  =  l//2(^)  > 
l/fi{w)  =  ^[{w).  Since  ^^(0)  >  ^[{0)  by  assumption,  ^(u;)  >  ^i{w)  for  all  u;  >  0.  [Or, 
if  Ci(0)  =  0,  then  /^(O)  =  0  for  i  =  1,2,  so  that  Ac'(0+)  >  0  implies  6(0)  =  4(/2(0))  > 
c'i(/i(0))  =  6(0).]  Thus,  (18)  holds:  '^i{;Vi{-))  =  6(rWi(-))/S(-),  so  that  6(-)  increasing 
and  6(0  >  6(")-  This  proves  value  and  threshold  monotonicity.  Since  also  f2{w)  <  fi{w), 
and  each  fi  is  increasing,  we  have  n2  (p)  =  /2  {rv2  (p) )  <  /2  {rvi  (p) )  <  /i  {rvi  (p) )  =  ni  (p) .       □ 

•  Proof  of  Proposition  6  (d):  Information  Quality.  If  Ci  >  C2,  then  Ei(-)  > 
E2(-)  and  therefore  the  premise  of  Claim  13  holds:   ^i{-,Vi{-))  =  ^(rWi(-))/Ei(-),  so  that 

33 


(,{■)  increasing  and  1/Ei(-)  <  1/E2(-)  imply  (18).  Hence  the  value  v  falls  and  triggers  shift 
in.  Since  /  is  unchanged,  nj(p)  =  f{rvi{p))  falls  uniformly  with  Vi.  □ 

Claim  14  (Proof  of  Proposition  6(e):  Impatience)     Given  interest  rates  r2>ri>0. 

(a)  The  r2-thresholds  are  shifted  m,  and  the  r2-value  v  lower  at  all  points  in  its  dornain; 

(b)  There  exists  a  possibly  empty  interval  [q,q],  strictly  contained  in  \p„,P2]  C  [p  , Pi],  with 
T^2{p)  <  ?^i(p)  for  all  p  €  [q^q],  and  n2{p)  >  ni(p)  otherwise. 

Proof  of  (a):  Since  r2  >  T]  with  all  else  equal,  ^  increasing  implies  (18)  and  Claim  13.  □ 
Proof  of  (6):  Let  vui{p)  =  riVi{p)  for  p  e  {p., pi),  i  =  1,2.  Define  functions  Aw{p), 
Aw',  Aw",  Av,  Av' ,  and  Av"  all  with  domain  \p^,P2\-  First,  p  <  p  and  p\  >  p2 
from  (a),  smooth  pasting,  and  strict  convexity  of  Vi  imply  Av'{p  )  <  0  <  Az;'(p2)- 
By  continuity,  Av'  is  strictly  increasing  in  some  subset  /  C  [p  ,  ^2]-  Hence,  Av"{p)  = 
[^{w2{p))  —  ^(wi{p))]/'E{p)  >  0,  and  so  W2{p)  >  uii{p)  and  n2{p)  >  ni{p).  We  show  that 
the  complement  set,  where  the  folk  result  7x2  <  ni  obtains,  is  a  possibly  empty  interval. 

By  definition  of  w,  the  return  Wi  is  strictly  convex  and  solves  Ti{p)w'/{p)  =  ri^{wi{p)) 
s.t.  u;,(p.)  =  r^7r(p^),  w'^ip^)  =  r,(7r^  -  n^),  w,{p,)  =  ri7r(p,),  w'^{pi)  =  ri(-K^  -  tt^).  Thus 

^2(^2)  ~  ''''ii'^A  ~'^a)  ^  ■^i(^A  ""^i)  —  '^[{Pi)  <  '^[{'£.2)^  where  the  first  equality  is  smooth 
pasting,  the  weak  inequality  follows  from  r2  >  ri  and  tt^  <  tt^,  and  the  strict  inequality 
from  p  >  p  and  w'{  >  0.  By  a  symmetric  argument  and  tt^  >  n^,  w'2{p2)  >  f^i(pi)- 
Therefore  the  smooth  function  Att;  is  strictly  decreasing  at  p  and  increasing  at  p2-  Since 
we  have  shown  that  Aw  is  strictly  positive  in  a  non-empty  set  /  C  \p  ,P2],  it  suffices  to  show 
that  Aw  cannot  have  a  local  non-negative  maximum  in  {p  ,p2)-  By  contradiction,  suppose 
that  Aw{p)  >  0  =  Aw'{p)  >  Aw"{p)  for  some  p  e  {p^,P2)-  Then  p  e  {p^,P2)  C  [0,1] 
implies  E(p)  >  0;  furthermore  r2  >  ri,  ^(•)  is  increasing  and  Aw{p)  >  0,  so  that  the 
familiar  contradiction  follows:  Aw"{p)  —  [E{p)]~^[r2^{w2{p))  —  ri^{wi{p))]  >  0  >  Aw"{p). 
Finally,  we  specialize  to  the  R&D  payoff  specification.  Here,  V2ip„)  —  0,  and  then 
Aw{p)  =  W2{p)  —  Wi{p)  =  —Wi{p)  <  0  by  p  >  p.-  Therefore  Aw  is  initially  strictly 
negative  and  declining;  since  it  must  become  strictly  positive  at  some  point  below  p2,  and 
it  cannot  have  a  local  non  negative  maximum,  it  cannot  change  sign  twice;  so  An  changes 
sign  exactly  once,  going  from  negative  to  positive  as  we  raise  p.  Hence,  there  is  an  interior 
cutoff'  p'  €  (P2,P2)  such  that  n{p)  declines  for  all  beliefs  p  <  p',  and  rises  for  all  p  >  p'.   □ 

C.2  Vanishing  /  Exploding  Convexity:  Completion  of  Proof  of  Proposition  7 

Assume  n2  >  ni,  with  A^  — >  00.  By  Claim  1,  ^^(^2)  —  Qki'^.i)  >  '^i-^A;("2  —  ni).  Then 
9k{n)  > gk{0)  +  J^ dgkin')  =0  +  J^ Xku' dn'  =  XkU^ /2 -^ 00  as  k  -^  00.  Similarly,  pfc(n)  ^  0  as 
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/c  — >  oo  when  A^  — >•  0.  Hence,  the  inverse  fk  of  g^  explodes  away  from  0.  Since  f  >  tt  ^  0, 
'>^k{p)  =  f{'rv{p))  >  f{rTr{p))  explodes.  The  proof  for  A^  ^  0  is  symmetric.  D 
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